I got tired of pasting client documents into ChatGPT and hoping for the best

by Arthurvdh

Hi AI users (all of us),
Quick context: I work with a lot of sensitive documents. Contracts, financial reports, stuff with names and IBANs in it. And like everyone else I started using AI to analyze them faster.
Then I realized I was just... sending all of it to OpenAI servers. Every client name. Every bank detail. The AI didn't need any of that to do its job — it just got it anyway because I pasted the whole document.
So I built something to fix it. ArcanAI strips the personal data in your browser before anything reaches the AI. Names become [NAME_1], IBANs become [IBAN_1], etc. The mapping stays local. The AI gets what it needs to work, nothing more.
It's live at arcanai.co. Free plan exists, no credit card.
Honest question for anyone who's thought about this: do you actually care about what your AI tool sees in your documents? Or is this a problem that sounds real but nobody actually loses sleep over?
Because that's the thing I'm trying to figure out , whether people feel this pain or just nod when you describe it.

Arthurvdh

on June 14, 2026

Say something nice to Arthurvdh…

Post Comment

1

Glad it landed. That reposition should make the pitch land a lot harder in enterprise conversations. Good luck with it.

Ojin

·
12 days ago
·
Reply
2

The "trust gap from GPT glue-coding" wedge is real. How did you handle showing users what context the model saw? My early testers wouldn't trust the output until I exposed the actual SQL + sources inline — felt counterintuitive but conversion shifted.

danielPark

·
2 months ago
·
Reply
1. 1
  
  That's why we built the Transparent Vault. Users see exactly what the model saw. Trust comes from showing the work. AI doesn't need to know everything to be useful, and showing that boundary visually changes how people feel about using it.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
2

This healthcare thread is spot on. I hit the same wall with HealthOS — a voice app that reads your nervous-system state. Decided early that audio never leaves the phone. Users weren't asking for it; I just knew that the moment you say "sends to cloud" in anything health-related, the trust-sensitive people walk, and you torch any shot at clinical validation down the line.

Your browser-side redaction approach is the same instinct — keep the sensitive work where the data already lives. What have compliance teams made of it so far?

sabber

·
2 months ago
·
Reply
1. 1
  
  "Sends to cloud" is a trust killer, you're right. The instinct to keep sensitive work where the data already lives is the only honest architecture in health.
  On compliance teams: early days still, but the ones who've seen it get it immediately. The question stops being "is it secure" and becomes "why isn't everything built this way."
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

The tension you've identified is exactly right and it's the one that will determine whether people trust this in professional settings or not. "We strip data client-side before anything reaches the AI" is the correct architecture for this problem. But the claim only holds when it's verifiable, not just stated. The moment a lawyer or a finance director is deciding whether to use this with real client documents, they're not going to take your word for it. They'll want to see the network tab, verify nothing leaves the browser before redaction runs, or ideally see the redaction code itself. Open-sourcing the redaction module is probably the strongest trust move available to you because it turns a claim into something auditable. The security layer of a product like this isn't just a feature, it's the entire product in the eyes of anyone whose documents matter. For the people who would pay real money for what you've built, proof of the architecture is more valuable than the architecture itself.

guy_powell

·
a month ago
·
Reply
1

The hard part for a tool like this is trust, and that's where I'd put my attention, because I looked at the page from the technical side and a few things quietly work against you.

First, the legibility. This is the hardest to read page I've looked at in a while. A lot of the text is grey on a near-black background, very small, and in places the contrast comes out around 1.7 where 4.5 is the minimum. I had to strain to read whole sections. For a tool asking people to hand over sensitive documents, text that looks barely there reads as unfinished, and unfinished is the last feeling you want to give here. Lifting the grey a lot, or darkening the background, plus bumping the smallest text up a size, would change the whole feel.

Second, and this is the big one for a privacy product. Your page is trying to load Google's ad and tracking scripts. The console shows it pulling in Google AdSense and Google Sign-In, and you're running Google Tag Manager too. Two problems with that. One, it's currently broken, because your own security policy blocks AdSense and the Google login, so they throw errors on load. Two, and bigger, a tool whose whole promise is "your data shouldn't go to strangers" loading an ad network is the exact contradiction a careful visitor will notice in two seconds. I'd remove the ad and tracking scripts completely. On this page they cost you far more trust than they could ever make back.

Third, the trust question under all of it. Your pitch solves "don't send your data to OpenAI" by sending it to ArcanAI instead. That only holds if the stripping genuinely happens on the user's machine and they can verify it. "Client-side, in your browser" is the right design, but in this category a claim isn't enough. I'd make it provable: open the code, or show the network tab with nothing leaving the browser, or walk through exactly how the redaction runs locally. Personally I'd much rather see the stripping happen on my own machine and never touch your servers at all. Otherwise you've moved the trust problem, not removed it, and proving that you actually solved it is your real moat.

To be fair, the underlying build is fast (performance is a 98) and the basics like the title and description are in good shape. So this is mostly about making the page look and feel as trustworthy as the idea actually is.

JohanBuildsFTW

·
2 months ago
·
Reply
1. 2
  
  Took your feedback seriously. The site is now light background, contrast fixed, tracking scripts removed. Genuinely useful audit, thank you.
  
  Arthurvdh
  
  ·
  a month ago
  ·
  Reply
  1. 1
    
    Love this. Pulling the tracking scripts off a privacy tool is the one that matters most, that's the page finally matching the promise. Light background and readable text on top of it, and you've gone from "looks risky" to "looks trustworthy" in a single pass. Well done, and fast.
    
    JohanBuildsFTW
    
    ·
    a month ago
    ·
    Reply
1

This is a real pain most people do not feel until they get burned — stripping PII locally before sending documents to AI feels like the obvious next step for anyone handling client data.

zasdasdasd

·
2 months ago
·
Reply
1

This is a real pain most people do not feel until they get burned — stripping PII locally before sending documents to AI feels like the obvious next step for anyone handling client data.

zasdasdasd

·
2 months ago
·
Reply
1

This is a real pain most people do not feel until they get burned — stripping PII locally before sending documents to AI feels like the obvious next step for anyone handling client data.

zasdasdasd

·
2 months ago
·
Reply
1

This is a real pain most people do not feel until they get burned — stripping PII locally before sending documents to AI feels like the obvious next step for anyone handling client data.

zasdasdasd

·
2 months ago
·
Reply
1

This is a real pain most people do not feel until they get burned — stripping PII locally before sending documents to AI feels like the obvious next step for anyone handling client data.

zasdasdasd

·
2 months ago
·
Reply
1

This is a real pain that most people don't feel until they get burned.

I think most founders are like you—they start shipping sensitive data to
AI casually, then wake up one day and realize "oh god, what have I been
sending?"
The ones who truly care are those handling customer data (payments, financial
records, health info). They care HARD. But they don't always know their
tools are exposing it.

You're solving for the paranoid/careful ones. That's a real segment.

manojit

·
2 months ago
·
Reply
1. 1
  
  "They don't always know their tools are exposing it" is the core problem. Not negligence, not carelessness. Just invisible exposure by design. The tool makes sharing frictionless because that's what serves the platform, not the user. Your clients' data isn't yours to send, even accidentally.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

This is a classic “context reset problem,” not a prompt problem.

Most AI workflows break because knowledge isn’t persistent — every new input forces the system to re-learn the situation.

I think the real upgrade isn’t better prompting, but building persistent decision context that evolves over time instead of restarting every interaction.

Omino

·
2 months ago
·
Reply
1. 1
  
  Persistent context that grows over time means more data accumulated on someone else's server. Better memory is only an upgrade if you control where that memory lives.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
  1. 1
    
    Memory's the easy win — every product can store more tomorrow. Surfacing is the hard part: noticing the user hit the same pattern 4× this month, showing it at the right moment, and giving them a taker action today. Most AI tools ace the first and flake the second. Trust falls over because the user can't see why the storage earned its weight — and that's where indie products quietly lose 60% of the cohort.
    
    Omino
    
    ·
    2 months ago
    ·
    Reply
1

started using AI for sprint retrospectives and within a week had team conflict data and roadmap decisions sitting in a third-party context window.

ItsKondrat

·
2 months ago
·
Reply
1. 1
  
  That's a perfect example of how it happens. Thank you for sharing! Not a breach, not a hack. Just normal usage, one week in, and your team's internal conflicts and strategic decisions are sitting in someone else's infrastructure. Nobody made a bad decision, the tool just made sharing invisible. That data didn't need to leave to get the job done.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
  1. 1
    
    yeah - 'invisible' is doing a lot of work there. that's a default someone at the vendor chose, not an oversight. users never experience it as a decision because it was never framed as one.
    
    ItsKondrat
    
    ·
    2 months ago
    ·
    Reply
1

Saw the site, and you are already positioned on exactly the right axis (governance, GDPR, HIPAA, audit trails), so to your actual question (do people care or just nod): the answer splits by buyer, and your own pricing shows the split. People with a hard legal duty (a law firm, a clinic, anyone under a DPA) genuinely lose sleep over this and will pay, but they almost never buy a €14 self-serve seat. They buy through procurement, with a signed DPA, a security review, and often a BAA. The self-serve prosumer at €6 to €14 is mostly the "nods but does not change behavior" crowd.
So the demand is real, but it lives in your enterprise lane more than the individual tiers. One careful note, since you sell on compliance: redacting PII before the model does not by itself make the buyer GDPR or HIPAA compliant, because your tool still processes the identifiers in order to redact them and re-identification is possible. Leading with "reduce exposure" (which you already do) is safer than any "compliant" claim, and that precision is part of what a duty-bound buyer checks for.

tovrio

·
2 months ago
·
Reply
1. 1
  
  The re-identification point is fair and I won't dodge it. Browser-side processing still sees the identifiers to redact them. "Reduce exposure" is the right claim, not "fully compliant," and we're precise about that on purpose.
  The complete answer is TEE, Trusted Execution Environment, which is phase 2. Hardware-level isolated environment where the processing happens in a sealed enclave, invisible even to us. That closes the re-identification loop entirely. For a duty-bound buyer, that's the architecture that actually holds up under a security review.
  We're building toward that. In the meantime, browser-side redaction removes the biggest exposure vector. Not perfect, but meaningfully better than sending raw documents.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
  1. 1
    
    TEE is the right end state, and I like that you are honest about it being phase 2 rather than claiming it now.
    One thing worth planning for while you build toward it: in a real security review, a TEE on its own rarely closes the question.
    The reviewer asks for remote attestation, basically cryptographic proof that the enclave is actually running the code you say it is and nothing else. Without attestation, "invisible even to us" is a promise. With it, it is verifiable, and that is the difference a duty-bound buyer signs off on.
    Separate from the enclave, the procurement checklist still wants a DPA that names your sub-processors and the region any server-side piece runs in.
    TEE answers "can you see my data." The DPA answers "who processes it and where.
    " Pairing both is what actually passes the review. The browser-side-now, TEE-later sequencing is sound in the meantime.
    
    tovrio
    
    ·
    2 months ago
    ·
    Reply
1

The gap between "people care" and "people change behavior" is real. What usually closes it isn't awareness, it's making the safe option the default one. If stripping PII is automatic and invisible, the behavior change happens without anyone having to make a decision.

gpitrella

·
2 months ago
·
Reply
1. 1
  
  Safe by default beats educated by choice every time. And the good news is AI doesn't need the raw data to get smarter. It needs signal. Anonymized documents still provide that.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

Honestly, I think it depends a lot on who's using it. As someone building apps and using AI tools constantly for code, prompts, etc, I don't think about it much for my own stuff since it's not sensitive. But for what you're describing, contracts with real names and IBANs, that's a different level entirely, and I'd guess most people genuinely don't realize they're sending that until someone points it out like you just did.
I think the "nobody loses sleep over it" feeling is more about awareness than not caring, once people see it laid out like this, I'd bet a good chunk would care. Curious if you've gotten that reaction from people you've shown this to.

tlkt5nv3ji5

·
2 months ago
·
Reply
1. 1
  
  Yes, exactly that reaction. The moment people see their own document with names, IBANs, medical data highlighted before it gets sent, something clicks. It stops being abstract.
  The awareness gap is real. Nobody thinks they're doing something wrong because the interface makes it feel safe. But your client's IBAN reaching a third party server isn't a vibe, it's their data, not yours to share. That distinction hits differently when you see it visually.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

Honest answer: the pain is real, but it's lopsided, and the split matters for how you sell it.

We build AI agents for regulated industries at Ojin (banking, healthcare, tourism), and the pattern is stark. Individuals mostly nod. They agree it's bad, then paste the whole doc anyway because convenience wins every time. That group is hard to monetize.

In regulated B2B it's a different universe. It's not "losing sleep," it's a hard gate. The first question in every enterprise conversation is some version of "where does the data go and what do you retain." Deals stall for months on exactly this. A client name and an IBAN reaching a third party isn't a vibe, it's a compliance violation with a number attached. Anyone under GDPR handling client documents feels it daily.

So I'd validate by who, not whether. The freelancer pasting a contract feels it for three seconds. The compliance officer who has to answer for the whole team's pasting habits feels it constantly, and has budget.

One back at you: are you positioning ArcanAI as a personal browser tool or as something a firm can mandate across a team? The people who actually lose sleep are the ones accountable for everyone else's behavior, not the individual doing the pasting. That choice changes your whole go-to-market.

Ojin

·
2 months ago
·
Reply
1. 1
  
  Both, but you're pushing me to be honest about the priority. The Business plan with team dashboard exists. But my messaging has been individual-first. That's probably backwards.
  The compliance officer who has to answer for the whole team's behavior is the real buyer. "A client name and an IBAN reaching a third party is a compliance violation with a number attached" is a better pitch than anything I've written so far.
  Repositioning around that. Thanks for the push.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

Honest answer from years in the Microsoft partner and MSP world, selling into compliance-heavy orgs: for individuals and small teams, this is a 'nod but don't lose sleep' problem. They want speed, not safety. But you're not really selling to them. In regulated orgs (legal, finance, healthcare) someone absolutely loses sleep over it, it's just not the person pasting the document. It's the security or compliance lead who has to sign off on the tool, and right now their honest answer to 'can we use ChatGPT on client files' is no. I watched plenty of good deals die not because the product was weak but because it couldn't answer 'where does the data go' on a security questionnaire. Browser-side redaction is a clean answer to that exact question. So I'd flip the positioning: the end user wants it faster, the buyer wants to not get fired. Sell to the second one. The free no-credit-card plan is fine top of funnel, but the money is the team that needs this to say yes to AI at all.

GregoryScottHenson

·
2 months ago
·
Reply
1. 1
  
  This is the clearest framing I've heard. Sell to the person who doesn't want to get fired, not the one pasting the document. And browser-side redaction is a clean answer to "where does the data go" on any security questionnaire. Genuinely useful, thank you.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

The two-confidence-problem framing in the comments is sharp. There's a parallel in healthcare dictation that I run into constantly. Clinicians want to dictate notes into their EHR but the audio can't leave the device - PHI, HIPAA, all of it. So they're stuck typing or using ancient on-prem Dragon installs. The confidence problem there is the same split: confidence that the data stays local, and confidence that the transcription is actually accurate for medical terminology. I built DictaFlow Medical to handle both - local processing so nothing leaves the device, and medical-specific models for accuracy. It's the same pattern you're describing with document sanitization: don't block the AI use case, just make it safe by default. Curious if you've had healthcare folks reach out about your tool - seems like it'd be a natural fit for clinics.

ryanshrott

·
2 months ago
·
Reply
1. 1
  
  Healthcare was actually one of my first target market. Clinicians get the risk immediately, they don't need convincing. Would love to swap notes, sounds like we're solving adjacent problems. Actually curious if there's a way to combine both approaches, anonymize before anything leaves the device. Could be worth exploring together.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

This is a real concern that most people quietly ignore. The "I'll just paste it in" habit is universal and nobody thinks about what's on the other end until something goes wrong.
The interesting thing is it's not just about privacy policy trust — it's about habit. People paste first and think later. Your tool solves a problem that exists before the user even recognizes they have it, which is a hard sell but a strong product if you can get the first use moment right.
To answer your question honestly: most people don't lose sleep over it until they have a reason to. One data breach story in their industry and suddenly everyone cares. That might be your distribution angle — not "protect your data" but "here's what happened to someone who didn't."

Elinvierno

·
2 months ago
·
Reply
1. 1
  
  "One breach story and suddenly everyone cares", nailed it. We're just waiting for the AI-gate that was always going to happen. I'd rather help people not be the case study. Actually that's not a bad idea, a Chrome extension that anonymizes before you paste, zero behavior change required. Might just build that on phase 3! Thank you
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

The "confidence" framing in this thread is more accurate than "privacy" and I'd push it one step further.

There are actually two confidence problems for professional AI use, usually treated as one.

The first is what you're solving: confidence that sensitive data doesn't leave in the wrong form. Real problem, clearly painful for anyone with compliance exposure.

The second is less visible: confidence that the output is actually right for your specific situation. Not technically accurate but operationally relevant. The AI analyzed the contract correctly but had no idea about the relationship history with this client, your standard negotiation boundaries, or the three things your firm always flags in this type of agreement.

Both produce the same symptom: professionals not trusting AI enough to remove themselves from the loop. But they need different fixes.

What's interesting about the regulated verticals you're describing (legal, finance, healthcare) is that they have both problems acutely. The privacy one is visible and creates liability. The operational relevance one is invisible but drives the "I still have to rework everything it gives me" frustration that keeps AI stuck as a drafting assistant instead of a real workflow change.

Curious whether you're seeing the second problem come up in user conversations.

FrancisFM

·
2 months ago
·
Reply
1. 1
  
  Sharp distinction. You're right that these are two separate problems and honestly, most tools conflate them.
  On the second one: yes, it comes up. But I'd argue it's downstream of the first. Professionals don't customize AI context because they don't trust it with their data in the first place. Fix the privacy layer first, and the operational relevance problem becomes solvable.
  The deeper issue is that AI platforms aren't incentivized to solve either. More context shared = more data collected = better models for them. The misalignment is structural, not accidental.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

This is actually a more common concern than it seems, especially in legal, finance, and consulting where sensitive data is involved.

Most people don’t really think about what they paste into AI tools, but when it comes to client documents, the risk feels very real. Even if nothing “bad” happens, the idea of sending personal or financial data to external servers creates hesitation.

What’s interesting about your approach is that it doesn’t block AI use, it just makes it safer by default. That’s a more practical solution than telling users to change their behavior.

quill_ai

·
2 months ago
·
Reply
1. 1
  
  Exactly this. And the scary part is the data was already sent before anyone even questions it. Most people don't realize that what they paste into AI tools today can feed systems designed to predict and influence behavior tomorrow. Not just theirs but their clients' too. ArcanAI doesn't block AI, it just enforces a simple principle: the model sees what it needs, nothing more. We shouldn't wait for an AI-gate to start treating this seriously.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

I think the pain is very real for people handling client, legal, or financial documents. The challenge might be that the people who care the most are often the ones with compliance requirements, so trust and proof probably matter as much as the product itself.

Kumar_SDE

·
2 months ago
·
Reply
1. 1
  
  100%!!! The proof we need to build isn't just technical (audit logs, anonymization previews, zero storage). It's also cultural, helping people understand why this matters before something goes badly wrong. Because by then, it's too late. And behind the technical layer, there's a simpler truth: AI data collection primarily serves industries trying to map and monetize our behaviors...
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

The "does anyone actually care" question is
one I keep running into too as a solo builder.
What I've noticed: people don't change behavior
from awareness. They change from a moment —
a bad experience, a close call, a client asking
questions.
You can't manufacture that trigger. But you can
make sure the solution is already there, easy to
use, when it hits.
That's probably your real job right now.

SkinInLabs

·
2 months ago
·
Reply
1. 1
  
  That's probably the most useful reframe I've gotten from this thread. Stop trying to manufacture urgency, just be ready when it arrives. Though honestly: don't you think it's already happening at a small scale? Quiet incidents, clients asking uncomfortable questions, data that ended up somewhere it shouldn't have. The large-scale AIgate moment hasn't hit yet but the small ones are already there, just not making headlines. My job is to make sure ArcanAI is the obvious answer when it does go wide.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
  1. 1
    
    The "small ones already there, just not making. headlines" framing exactly it. One uncomfortable client call is worth more than a hundred articles about AI privacy risks. Rooting for you.
    
    SkinInLabs
    
    ·
    2 months ago
    ·
    Reply
1

This is a smart approach to a real security concern. Data privacy and compliance are becoming increasingly important, especially with regulations like GDPR and CCPA. The fact that you're handling sensitive data processing locally before any AI analysis is a key differentiator. I'd be curious to know more about your target market strategy — is this positioned primarily for enterprises or SMBs?

zeevandeep

·
2 months ago
·
Reply
1. 1
  
  Both!! but the vertical split is a bit of a trap. Companies have compliance budgets so they move first, but individuals are the ones whose data actually ends up somewhere they didn't consent to. Every document you paste into ChatGPT is training data for a model that will eventually predict your behavior and sell that prediction to someone. The professional use case is urgent now. The individual one is just quieter about it.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

the honest answer to your question depends on who you're asking. individual users often don't lose sleep over it even if they say they do. compliance officers, legal teams, and anyone who has had to explain a data incident to a client do lose sleep over it. those are very different buyer conversations and the second group actually has a budget line for this problem. which one are you currently talking to and is the product priced and positioned for individual use or team use

adin_builds

·
2 months ago
·
Reply
1. 1
  
  Right now talking to individuals mostly, but the compliance pressure in legal/finance/healthcare makes those conversations easier. The deeper issue is that we're waiting for an "AI-gate" moment before people take this seriously , the same way nobody cared about Facebook's data practices until Cambridge Analytica. I'd rather not wait for that. Product is priced for both. you can check it out https://arcanai.co/pricing
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

I think this solves a real problem, but the target audience matters.

For casual users, the convenience of pasting documents directly into an AI tool usually outweighs privacy concerns. Many people simply trust the provider or don't think about what data they're sharing.

For professionals—lawyers, accountants, consultants, HR teams, healthcare workers, finance professionals, and anyone handling client data—the concern is very real. In those environments, it's often not just about personal discomfort; it's about compliance, confidentiality, and contractual obligations.

The challenge may be that people don't wake up thinking, "I need document anonymization." They feel the pain when they're about to upload a sensitive document and hesitate. Your product could be strongest when positioned as a way to use AI safely with confidential information, rather than as a generic privacy tool.

One thing I'd be curious about: how well does the anonymization preserve context? If the AI can still accurately analyze contracts, financial reports, and other structured documents after names and identifiers are replaced, that's where the real value lies. Privacy is important, but maintaining output quality is what will make people adopt it.

muhammadammar232425

·
2 months ago
·
Reply
1. 1
  
  Quality holds, the AI gets the full document structure, just with tokens instead of real identities. But your framing is right: privacy and output quality aren't in tension, they're the same argument. If the analysis degrades when you remove the name, the AI was using the name to do something it shouldn't have needed to do. See it as a governace layer more than just a generic privacy tool. but please review how it works at https://arcanai.co/how-it-works. I'm fully transparent!!
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

this is a real problem, not a theoretical one. most people I've talked to about their AI workflows have zero process for checking what they're sending — they just paste and go. the 'does anyone actually care' question is interesting because it's usually not that people don't care, it's that they haven't thought about it. the moment you show someone their IBAN sitting in a ChatGPT prompt history they care a lot. the gap is awareness not apathy.

Ozzie

·
2 months ago
·
Reply
1. 1
  
  "Awareness not apathy" is exactly it. And the awareness gap is manufactured to some extent , AI companies benefit from you not thinking about what you're handing over. The data doesn't just sit there, it builds behavioral models that get sold to advertisers, insurers, employers. The individual user isn't the customer. They're the product. Users need to be educated about it!
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

The local-first part is the whole thing. I talk to a lot of freelancers and small agencies, and the unspoken blocker to AI adoption isn't capability, it's 'where does my client's data actually go.' The moment a contract with names and IBANs leaves the browser, half of them quietly stop using the tool. A redaction step that provably never sends raw PII to the API isn't really a feature, it's the permission slip that lets a whole category of sensitive work touch AI at all. Curious how you handle false negatives though, the one IBAN the regex misses is the one that matters.

AtlasHQ

·
2 months ago
·
Reply
1. 1
  
  False negatives are real and the Llama layer upstream helps catch what regex misses. But there's also a manual layer, before you confirm the analysis, you see exactly what gets anonymized and you can add your own terms. Anything sensitive you want tokenized, you add it yourself. The system doesn't decide alone. The broader point though, the one IBAN that slips through matters because somewhere that IBAN is being stored, indexed, and used to build a profile. Regulation should be forcing companies to justify every piece of data they retain. It's not.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

The privacy information is very critical. I sometimes forget about it too. How did you build with the backend?

Pai

·
2 months ago
·
Reply
1. 1
  
  The anonymization runs entirely in the browser, the backend never sees your raw data. That's the part that matters architecturally. Anyone can write a privacy policy, fewer can show you a system where the sensitive data structurally never reaches the server.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

To answer your honest question: yes, it's a real concern — but only for a specific subset of users who've already had the uncomfortable realization you described.

We ran into this same wall building Swiftbill (invoice generator for freelancers). Our users handle client names, addresses, bank details, payment terms. When we originally designed the PDF generation to run server-side, early users pushed back hard. "Why does my client's payment info need to touch your server to make a PDF?" They were right.

We moved the entire PDF generation into the browser with pdf-lib. Nothing leaves the device. That one change removed a category of objection completely.

What you're building solves a real problem — the question is whether users feel the pain before a breach or only after. My guess: freelancers and consultants who deal with client financials feel it acutely. People processing their own notes probably don't. The ICP matters a lot here.

SwiftBill

·
2 months ago
·
Reply
1. 1
  
  The Swiftbill move is exactly the right instinct. "Why does your server need to touch this" is the question everyone should be asking every AI tool they use and almost nobody does. The answer is usually "it doesn't, we just built it that way because it was easier." That convenience is what the data economy runs on.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

I actually think about this a lot. Building a family budgeting app made me realize that financial data is one of the hardest things to ask people to share. Trust is probably a bigger challenge than the product itself.

MonySi

·
2 months ago
·
Reply
1. 1
  
  Trust is harder to build than the product itself, you're right! But I'd push back slightly: trust shouldn't be something users extend to companies, it should be something architectures make unnecessary. The goal is a system where it doesn't matter whether you trust us, because we structurally can't access your data anyway.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

Honestly I'm in the "nod and keep pasting" camp, but only because nothing made it easy. Stripping it in the browser before it leaves is the part that'd actually change my behavior.

Finlo

·
2 months ago
·
Reply
1. 1
  
  That's the exact validation I needed. Friction was the only barrier, not awareness. Building it into the workflow so it's invisible is the next problem to solve.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

I wonder if the pain isn't privacy itself but confidence. Most people don't know what they can safely upload, so they either overshare or avoid using AI altogether. A tool that removes that uncertainty could be valuable even beyond the privacy angle.

boothkeepos

·
2 months ago
·
Reply
1. 2
  
  "Confidence" is a better word for it than privacy. People know they probably shouldn't paste everything, but they have no way to know where the line is. Removing that uncertainty might be the real product.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

People care in proportion to how close they've been to an incident. Most nod; the ones who've had a client ask "where did our data go" feel it in their gut. What gets underweighted: it isn't only what OpenAI sees. That same raw document usually also lands in your own logs and whatever vector store you're indexing it into. Stripping PII before it leaves the browser shuts the loudest door, but the copies inside your own stack are where it tends to pile up unnoticed. Worth deciding what you retain, not just what you send.

chalermpon

·
2 months ago
·
Reply
1. 1
  
  That's a sharp point but ArcanAI's architecture specifically avoids that. Zero server storage, the token map never leaves the browser, results are auto-deleted in 15 minutes, nothing is indexed or retained on our side. The outbound problem and the retention problem are solved together, not separately.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
  1. 1
    
    Yeah, the 15-minute auto-delete closes the gap I was pointing at. If nothing leaves the browser with real names attached, your downstream copies only ever hold tokens too. What I would still guard hardest is the token map itself, since it's now the one artifact that can turn every [NAME_1] back into a real person. Where that map lives and how long it survives is what I would keep poking at.
    
    chalermpon
    
    ·
    2 months ago
    ·
    Reply
1

This resonates a lot. I ran into a related issue from the other direction after getting output back from ChatGPT/Claude, it's often full of markdown clutter, "As an AI..." phrases, and formatting that needs cleanup before I can actually use it in client deliverables.
Ended up building a small free tool (AiCleanerText) to handle that cleanup automatically. Curious what approach you landed on for the document-pasting side did you build something custom, or use existing tools/APIs to structure the input better?

AiTextCleaner

·
2 months ago
·
Reply
1. 1
  
  Built the anonymization custom, regex-based PII detection running entirely in the browser before anything hits an API. No server-side step for that part. The tricky bit was speed, had to get it under 200ms or it feels like friction. Interesting that you went the other direction, cleaning the output rather than the input. Different problem, same underlying frustration with how raw AI output actually is.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
1

To answer your question: it depends entirely on the vertical, not the individual.

Solo founders building in public? They don't care at all. The data is already public.

Legal, finance, healthcare? They care intensely. Not because of personal ethics but because a data incident creates liability. The decision isn't "should I protect client data" it's "what happens to my business if I don't."

I ran into the same constraint building Genie 007. It processes voice in the browser rather than sending audio to a server because some users flat-out won't use a tool that phones home. The local processing became a selling point, not just a technical choice.

You might find your strongest early traction in a specific regulated vertical rather than the broad "AI users" category. Accountants handling client financials. Paralegals reviewing contracts. The problem is urgent for them in a way it isn't for general users.

AmandaBrown

·
2 months ago
·
Reply
1. 1
  
  Agree on the verticals, that's where the immediate pain is. But I think there's a longer arc here too. AI tools are accumulating an insane amount of context about our lives, professional and personal. At some point there will be a breach, or worse, these models will know more about us than we do ourselves. The professional use case is urgent now. The individual one becomes urgent later. I'd rather build the habit before the incident happens.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
  1. 1
    
    The asymmetry between risk and perception is the real problem. People don't feel it until something breaks. Same logic as backups: nobody cares until they need them. The breach you're describing probably comes before most people build the habit, which means whoever builds trust infrastructure now has a real head start on the regulatory wave that follows.
    
    AmandaBrown
    
    ·
    2 months ago
    ·
    Reply
1

I'd be careful treating this as a privacy question too quickly.

The interesting question may not be whether people care what the AI sees.

It may be what has to be true before they're willing to change their current behavior.

Those sound similar, but they can lead to very different conclusions about the problem, the buyer, and the validation signal.

I wouldn't make that call casually from the current feedback.

aryan_sinh

·
2 months ago
·
Reply
1. 1
  
  Fair point. The behavior change question is the harder one, people don't stop pasting until something goes wrong. A data incident, a client complaint, a regulator asking questions. I can't manufacture that trigger. What I can do is make sure when people do start caring, the solution is already there and easy to use.
  
  Arthurvdh
  
  ·
  2 months ago
  ·
  Reply
  1. 1
    
    Possibly.
    
    The reason I'd still be careful is that I don't think the interesting part is whether that trigger exists.
    
    I think it's the decision that follows from assuming it does.
    
    That's one of those things that can quietly shape validation, positioning, and what signals end up looking meaningful.
    
    I wouldn't try to unpack that properly in a thread.
    
    If you're curious, drop your email and I'll put together the tighter version.
    
    aryan_sinh
    
    ·
    2 months ago
    ·
    Reply
    1. 1
      
      [email protected]
      
      Arthurvdh
      
      ·
      2 months ago
      ·
      Reply
      1. 1
        
        Sent you a note by email.
        
        I think the decision underneath the trigger assumption matters more than the trigger itself.
        
        aryan_sinh
        
        ·
        2 months ago
        ·
        Reply
1

This comment was deleted 2 months ago.

danielPark

·
2 months ago