Hi AI users (all of us),
Quick context: I work with a lot of sensitive documents. Contracts, financial reports, stuff with names and IBANs in it. And like everyone else I started using AI to analyze them faster.
Then I realized I was just... sending all of it to OpenAI servers. Every client name. Every bank detail. The AI didn't need any of that to do its job — it just got it anyway because I pasted the whole document.
So I built something to fix it. ArcanAI strips the personal data in your browser before anything reaches the AI. Names become [NAME_1], IBANs become [IBAN_1], etc. The mapping stays local. The AI gets what it needs to work, nothing more.
It's live at arcanai.co. Free plan exists, no credit card.
Honest question for anyone who's thought about this: do you actually care about what your AI tool sees in your documents? Or is this a problem that sounds real but nobody actually loses sleep over?
Because that's the thing I'm trying to figure out , whether people feel this pain or just nod when you describe it.
The "trust gap from GPT glue-coding" wedge is real. How did you handle showing users what context the model saw? My early testers wouldn't trust the output until I exposed the actual SQL + sources inline — felt counterintuitive but conversion shifted.
This healthcare thread is spot on. I hit the same wall with HealthOS — a voice app that reads your nervous-system state. Decided early that audio never leaves the phone. Users weren't asking for it; I just knew that the moment you say "sends to cloud" in anything health-related, the trust-sensitive people walk, and you torch any shot at clinical validation down the line.
Your browser-side redaction approach is the same instinct — keep the sensitive work where the data already lives. What have compliance teams made of it so far?
"Sends to cloud" is a trust killer, you're right. The instinct to keep sensitive work where the data already lives is the only honest architecture in health.
On compliance teams: early days still, but the ones who've seen it get it immediately. The question stops being "is it secure" and becomes "why isn't everything built this way."
Honestly, I think it depends a lot on who's using it. As someone building apps and using AI tools constantly for code, prompts, etc, I don't think about it much for my own stuff since it's not sensitive. But for what you're describing, contracts with real names and IBANs, that's a different level entirely, and I'd guess most people genuinely don't realize they're sending that until someone points it out like you just did.
I think the "nobody loses sleep over it" feeling is more about awareness than not caring, once people see it laid out like this, I'd bet a good chunk would care. Curious if you've gotten that reaction from people you've shown this to.
Honest answer: the pain is real, but it's lopsided, and the split matters for how you sell it.
We build AI agents for regulated industries at Ojin (banking, healthcare, tourism), and the pattern is stark. Individuals mostly nod. They agree it's bad, then paste the whole doc anyway because convenience wins every time. That group is hard to monetize.
In regulated B2B it's a different universe. It's not "losing sleep," it's a hard gate. The first question in every enterprise conversation is some version of "where does the data go and what do you retain." Deals stall for months on exactly this. A client name and an IBAN reaching a third party isn't a vibe, it's a compliance violation with a number attached. Anyone under GDPR handling client documents feels it daily.
So I'd validate by who, not whether. The freelancer pasting a contract feels it for three seconds. The compliance officer who has to answer for the whole team's pasting habits feels it constantly, and has budget.
One back at you: are you positioning ArcanAI as a personal browser tool or as something a firm can mandate across a team? The people who actually lose sleep are the ones accountable for everyone else's behavior, not the individual doing the pasting. That choice changes your whole go-to-market.
Honest answer from years in the Microsoft partner and MSP world, selling into compliance-heavy orgs: for individuals and small teams, this is a 'nod but don't lose sleep' problem. They want speed, not safety. But you're not really selling to them. In regulated orgs (legal, finance, healthcare) someone absolutely loses sleep over it, it's just not the person pasting the document. It's the security or compliance lead who has to sign off on the tool, and right now their honest answer to 'can we use ChatGPT on client files' is no. I watched plenty of good deals die not because the product was weak but because it couldn't answer 'where does the data go' on a security questionnaire. Browser-side redaction is a clean answer to that exact question. So I'd flip the positioning: the end user wants it faster, the buyer wants to not get fired. Sell to the second one. The free no-credit-card plan is fine top of funnel, but the money is the team that needs this to say yes to AI at all.
This is the clearest framing I've heard. Sell to the person who doesn't want to get fired, not the one pasting the document. And browser-side redaction is a clean answer to "where does the data go" on any security questionnaire. Genuinely useful, thank you.
The two-confidence-problem framing in the comments is sharp. There's a parallel in healthcare dictation that I run into constantly. Clinicians want to dictate notes into their EHR but the audio can't leave the device - PHI, HIPAA, all of it. So they're stuck typing or using ancient on-prem Dragon installs. The confidence problem there is the same split: confidence that the data stays local, and confidence that the transcription is actually accurate for medical terminology. I built DictaFlow Medical to handle both - local processing so nothing leaves the device, and medical-specific models for accuracy. It's the same pattern you're describing with document sanitization: don't block the AI use case, just make it safe by default. Curious if you've had healthcare folks reach out about your tool - seems like it'd be a natural fit for clinics.
Healthcare was actually one of my first target market. Clinicians get the risk immediately, they don't need convincing. Would love to swap notes, sounds like we're solving adjacent problems. Actually curious if there's a way to combine both approaches, anonymize before anything leaves the device. Could be worth exploring together.
This is a real concern that most people quietly ignore. The "I'll just paste it in" habit is universal and nobody thinks about what's on the other end until something goes wrong.
The interesting thing is it's not just about privacy policy trust — it's about habit. People paste first and think later. Your tool solves a problem that exists before the user even recognizes they have it, which is a hard sell but a strong product if you can get the first use moment right.
To answer your question honestly: most people don't lose sleep over it until they have a reason to. One data breach story in their industry and suddenly everyone cares. That might be your distribution angle — not "protect your data" but "here's what happened to someone who didn't."
"One breach story and suddenly everyone cares", nailed it. We're just waiting for the AI-gate that was always going to happen. I'd rather help people not be the case study. Actually that's not a bad idea, a Chrome extension that anonymizes before you paste, zero behavior change required. Might just build that on phase 3! Thank you
The "confidence" framing in this thread is more accurate than "privacy" and I'd push it one step further.
There are actually two confidence problems for professional AI use, usually treated as one.
The first is what you're solving: confidence that sensitive data doesn't leave in the wrong form. Real problem, clearly painful for anyone with compliance exposure.
The second is less visible: confidence that the output is actually right for your specific situation. Not technically accurate but operationally relevant. The AI analyzed the contract correctly but had no idea about the relationship history with this client, your standard negotiation boundaries, or the three things your firm always flags in this type of agreement.
Both produce the same symptom: professionals not trusting AI enough to remove themselves from the loop. But they need different fixes.
What's interesting about the regulated verticals you're describing (legal, finance, healthcare) is that they have both problems acutely. The privacy one is visible and creates liability. The operational relevance one is invisible but drives the "I still have to rework everything it gives me" frustration that keeps AI stuck as a drafting assistant instead of a real workflow change.
Curious whether you're seeing the second problem come up in user conversations.
Sharp distinction. You're right that these are two separate problems and honestly, most tools conflate them.
On the second one: yes, it comes up. But I'd argue it's downstream of the first. Professionals don't customize AI context because they don't trust it with their data in the first place. Fix the privacy layer first, and the operational relevance problem becomes solvable.
The deeper issue is that AI platforms aren't incentivized to solve either. More context shared = more data collected = better models for them. The misalignment is structural, not accidental.
This is actually a more common concern than it seems, especially in legal, finance, and consulting where sensitive data is involved.
Most people don’t really think about what they paste into AI tools, but when it comes to client documents, the risk feels very real. Even if nothing “bad” happens, the idea of sending personal or financial data to external servers creates hesitation.
What’s interesting about your approach is that it doesn’t block AI use, it just makes it safer by default. That’s a more practical solution than telling users to change their behavior.
Exactly this. And the scary part is the data was already sent before anyone even questions it. Most people don't realize that what they paste into AI tools today can feed systems designed to predict and influence behavior tomorrow. Not just theirs but their clients' too. ArcanAI doesn't block AI, it just enforces a simple principle: the model sees what it needs, nothing more. We shouldn't wait for an AI-gate to start treating this seriously.
I think the pain is very real for people handling client, legal, or financial documents. The challenge might be that the people who care the most are often the ones with compliance requirements, so trust and proof probably matter as much as the product itself.
100%!!! The proof we need to build isn't just technical (audit logs, anonymization previews, zero storage). It's also cultural, helping people understand why this matters before something goes badly wrong. Because by then, it's too late. And behind the technical layer, there's a simpler truth: AI data collection primarily serves industries trying to map and monetize our behaviors...
The "does anyone actually care" question is
one I keep running into too as a solo builder.
What I've noticed: people don't change behavior
from awareness. They change from a moment —
a bad experience, a close call, a client asking
questions.
You can't manufacture that trigger. But you can
make sure the solution is already there, easy to
use, when it hits.
That's probably your real job right now.
That's probably the most useful reframe I've gotten from this thread. Stop trying to manufacture urgency, just be ready when it arrives. Though honestly: don't you think it's already happening at a small scale? Quiet incidents, clients asking uncomfortable questions, data that ended up somewhere it shouldn't have. The large-scale AIgate moment hasn't hit yet but the small ones are already there, just not making headlines. My job is to make sure ArcanAI is the obvious answer when it does go wide.
The "small ones already there, just not making. headlines" framing exactly it. One uncomfortable client call is worth more than a hundred articles about AI privacy risks. Rooting for you.
This is a smart approach to a real security concern. Data privacy and compliance are becoming increasingly important, especially with regulations like GDPR and CCPA. The fact that you're handling sensitive data processing locally before any AI analysis is a key differentiator. I'd be curious to know more about your target market strategy — is this positioned primarily for enterprises or SMBs?
Both!! but the vertical split is a bit of a trap. Companies have compliance budgets so they move first, but individuals are the ones whose data actually ends up somewhere they didn't consent to. Every document you paste into ChatGPT is training data for a model that will eventually predict your behavior and sell that prediction to someone. The professional use case is urgent now. The individual one is just quieter about it.
the honest answer to your question depends on who you're asking. individual users often don't lose sleep over it even if they say they do. compliance officers, legal teams, and anyone who has had to explain a data incident to a client do lose sleep over it. those are very different buyer conversations and the second group actually has a budget line for this problem. which one are you currently talking to and is the product priced and positioned for individual use or team use
Right now talking to individuals mostly, but the compliance pressure in legal/finance/healthcare makes those conversations easier. The deeper issue is that we're waiting for an "AI-gate" moment before people take this seriously , the same way nobody cared about Facebook's data practices until Cambridge Analytica. I'd rather not wait for that. Product is priced for both. you can check it out https://arcanai.co/pricing
I think this solves a real problem, but the target audience matters.
For casual users, the convenience of pasting documents directly into an AI tool usually outweighs privacy concerns. Many people simply trust the provider or don't think about what data they're sharing.
For professionals—lawyers, accountants, consultants, HR teams, healthcare workers, finance professionals, and anyone handling client data—the concern is very real. In those environments, it's often not just about personal discomfort; it's about compliance, confidentiality, and contractual obligations.
The challenge may be that people don't wake up thinking, "I need document anonymization." They feel the pain when they're about to upload a sensitive document and hesitate. Your product could be strongest when positioned as a way to use AI safely with confidential information, rather than as a generic privacy tool.
One thing I'd be curious about: how well does the anonymization preserve context? If the AI can still accurately analyze contracts, financial reports, and other structured documents after names and identifiers are replaced, that's where the real value lies. Privacy is important, but maintaining output quality is what will make people adopt it.
Quality holds, the AI gets the full document structure, just with tokens instead of real identities. But your framing is right: privacy and output quality aren't in tension, they're the same argument. If the analysis degrades when you remove the name, the AI was using the name to do something it shouldn't have needed to do. See it as a governace layer more than just a generic privacy tool. but please review how it works at https://arcanai.co/how-it-works. I'm fully transparent!!
this is a real problem, not a theoretical one. most people I've talked to about their AI workflows have zero process for checking what they're sending — they just paste and go. the 'does anyone actually care' question is interesting because it's usually not that people don't care, it's that they haven't thought about it. the moment you show someone their IBAN sitting in a ChatGPT prompt history they care a lot. the gap is awareness not apathy.
"Awareness not apathy" is exactly it. And the awareness gap is manufactured to some extent , AI companies benefit from you not thinking about what you're handing over. The data doesn't just sit there, it builds behavioral models that get sold to advertisers, insurers, employers. The individual user isn't the customer. They're the product. Users need to be educated about it!
The local-first part is the whole thing. I talk to a lot of freelancers and small agencies, and the unspoken blocker to AI adoption isn't capability, it's 'where does my client's data actually go.' The moment a contract with names and IBANs leaves the browser, half of them quietly stop using the tool. A redaction step that provably never sends raw PII to the API isn't really a feature, it's the permission slip that lets a whole category of sensitive work touch AI at all. Curious how you handle false negatives though, the one IBAN the regex misses is the one that matters.
False negatives are real and the Llama layer upstream helps catch what regex misses. But there's also a manual layer, before you confirm the analysis, you see exactly what gets anonymized and you can add your own terms. Anything sensitive you want tokenized, you add it yourself. The system doesn't decide alone. The broader point though, the one IBAN that slips through matters because somewhere that IBAN is being stored, indexed, and used to build a profile. Regulation should be forcing companies to justify every piece of data they retain. It's not.
Mechanically this is the right instinct — strip the PII client-side so the model only ever sees [NAME_1], and the real identity never makes the trip. That's a fundamentally stronger guarantee than a privacy policy, because you're not asking anyone to trust a promise, you've removed the data from the path entirely. I made the same call building a small iOS memo app solo: keep the sensitive processing on-device so nothing private leaves the phone in the first place, even for things like turning speech into text. The hard part you'll hit isn't the redaction, it's the edge cases — a name embedded in a sentence, an IBAN split across a line break. People say they want privacy, but what actually earns trust is an architecture that makes the leak impossible rather than merely unlikely.
"Impossible rather than merely unlikely" is the right bar. A privacy policy is a promise. An architecture that never receives the data in the first place is a proof. We shouldn't need to trust AI companies, we should build systems where that trust is irrelevant. The next phase of ArcanAI will include a TEE - Trusted Execution Environment - meaning even the server processing can't be read by anyone, including us. Hardware-level proof, not just architectural.
The privacy information is very critical. I sometimes forget about it too. How did you build with the backend?
The anonymization runs entirely in the browser, the backend never sees your raw data. That's the part that matters architecturally. Anyone can write a privacy policy, fewer can show you a system where the sensitive data structurally never reaches the server.
To answer your honest question: yes, it's a real concern — but only for a specific subset of users who've already had the uncomfortable realization you described.
We ran into this same wall building Swiftbill (invoice generator for freelancers). Our users handle client names, addresses, bank details, payment terms. When we originally designed the PDF generation to run server-side, early users pushed back hard. "Why does my client's payment info need to touch your server to make a PDF?" They were right.
We moved the entire PDF generation into the browser with pdf-lib. Nothing leaves the device. That one change removed a category of objection completely.
What you're building solves a real problem — the question is whether users feel the pain before a breach or only after. My guess: freelancers and consultants who deal with client financials feel it acutely. People processing their own notes probably don't. The ICP matters a lot here.
The Swiftbill move is exactly the right instinct. "Why does your server need to touch this" is the question everyone should be asking every AI tool they use and almost nobody does. The answer is usually "it doesn't, we just built it that way because it was easier." That convenience is what the data economy runs on.
I actually think about this a lot. Building a family budgeting app made me realize that financial data is one of the hardest things to ask people to share. Trust is probably a bigger challenge than the product itself.
Trust is harder to build than the product itself, you're right! But I'd push back slightly: trust shouldn't be something users extend to companies, it should be something architectures make unnecessary. The goal is a system where it doesn't matter whether you trust us, because we structurally can't access your data anyway.
Honestly I'm in the "nod and keep pasting" camp, but only because nothing made it easy. Stripping it in the browser before it leaves is the part that'd actually change my behavior.
That's the exact validation I needed. Friction was the only barrier, not awareness. Building it into the workflow so it's invisible is the next problem to solve.
I wonder if the pain isn't privacy itself but confidence. Most people don't know what they can safely upload, so they either overshare or avoid using AI altogether. A tool that removes that uncertainty could be valuable even beyond the privacy angle.
"Confidence" is a better word for it than privacy. People know they probably shouldn't paste everything, but they have no way to know where the line is. Removing that uncertainty might be the real product.
People care in proportion to how close they've been to an incident. Most nod; the ones who've had a client ask "where did our data go" feel it in their gut. What gets underweighted: it isn't only what OpenAI sees. That same raw document usually also lands in your own logs and whatever vector store you're indexing it into. Stripping PII before it leaves the browser shuts the loudest door, but the copies inside your own stack are where it tends to pile up unnoticed. Worth deciding what you retain, not just what you send.
That's a sharp point but ArcanAI's architecture specifically avoids that. Zero server storage, the token map never leaves the browser, results are auto-deleted in 15 minutes, nothing is indexed or retained on our side. The outbound problem and the retention problem are solved together, not separately.
Yeah, the 15-minute auto-delete closes the gap I was pointing at. If nothing leaves the browser with real names attached, your downstream copies only ever hold tokens too. What I would still guard hardest is the token map itself, since it's now the one artifact that can turn every [NAME_1] back into a real person. Where that map lives and how long it survives is what I would keep poking at.
This resonates a lot. I ran into a related issue from the other direction after getting output back from ChatGPT/Claude, it's often full of markdown clutter, "As an AI..." phrases, and formatting that needs cleanup before I can actually use it in client deliverables.
Ended up building a small free tool (AiCleanerText) to handle that cleanup automatically. Curious what approach you landed on for the document-pasting side did you build something custom, or use existing tools/APIs to structure the input better?
Built the anonymization custom, regex-based PII detection running entirely in the browser before anything hits an API. No server-side step for that part. The tricky bit was speed, had to get it under 200ms or it feels like friction. Interesting that you went the other direction, cleaning the output rather than the input. Different problem, same underlying frustration with how raw AI output actually is.
To answer your question: it depends entirely on the vertical, not the individual.
Solo founders building in public? They don't care at all. The data is already public.
Legal, finance, healthcare? They care intensely. Not because of personal ethics but because a data incident creates liability. The decision isn't "should I protect client data" it's "what happens to my business if I don't."
I ran into the same constraint building Genie 007. It processes voice in the browser rather than sending audio to a server because some users flat-out won't use a tool that phones home. The local processing became a selling point, not just a technical choice.
You might find your strongest early traction in a specific regulated vertical rather than the broad "AI users" category. Accountants handling client financials. Paralegals reviewing contracts. The problem is urgent for them in a way it isn't for general users.
Agree on the verticals, that's where the immediate pain is. But I think there's a longer arc here too. AI tools are accumulating an insane amount of context about our lives, professional and personal. At some point there will be a breach, or worse, these models will know more about us than we do ourselves. The professional use case is urgent now. The individual one becomes urgent later. I'd rather build the habit before the incident happens.
The asymmetry between risk and perception is the real problem. People don't feel it until something breaks. Same logic as backups: nobody cares until they need them. The breach you're describing probably comes before most people build the habit, which means whoever builds trust infrastructure now has a real head start on the regulatory wave that follows.
I'd be careful treating this as a privacy question too quickly.
The interesting question may not be whether people care what the AI sees.
It may be what has to be true before they're willing to change their current behavior.
Those sound similar, but they can lead to very different conclusions about the problem, the buyer, and the validation signal.
I wouldn't make that call casually from the current feedback.
Fair point. The behavior change question is the harder one, people don't stop pasting until something goes wrong. A data incident, a client complaint, a regulator asking questions. I can't manufacture that trigger. What I can do is make sure when people do start caring, the solution is already there and easy to use.
Possibly.
The reason I'd still be careful is that I don't think the interesting part is whether that trigger exists.
I think it's the decision that follows from assuming it does.
That's one of those things that can quietly shape validation, positioning, and what signals end up looking meaningful.
I wouldn't try to unpack that properly in a thread.
If you're curious, drop your email and I'll put together the tighter version.
[email protected]
Sent you a note by email.
I think the decision underneath the trigger assumption matters more than the trigger itself.
This comment was deleted 6 hours ago.