I built a tool that filters AI slop out of English social posts. The hardest part was teaching AI to stop sounding like AI.

by muzili88

I'm a Chinese indie developer. I have ideas. My English is decent but not native. Every time I tried to post on X or LinkedIn, the result sounded like a textbook. Or worse, like ChatGPT.

The ironic part: when I asked ChatGPT to "make it sound natural," it added em dashes everywhere. You know the type. "I learned three things, and the third one surprised me." That's not how real people write. But AI keeps doing it because its training data is full of that pattern.

So I built VoiceBridge. You paste a Chinese thought. It rewrites it for X, LinkedIn, and Reddit in three completely different voices. Not translation. Cultural adaptation.

The technical problem was harder than I expected.

I'm using DeepSeek V3 for generation. First version was garbage. The output had em dashes everywhere, "Let me tell you" openings, "The lesson here is" endings, "Here's the thing" transitions. You get the idea.

So I built a three-layer defense system.

Layer 1: System prompt hard constraints.

Anti-fabrication rules. The AI is not allowed to invent user counts, ratings, tech stacks, timelines, dollar amounts, or team sizes. It also can't do the coaching ending thing. No "Sometimes the answer is simple." No "The real takeaway is." Just the content.

Layer 2: Cultural translation table.

This was a rabbit hole. "Neijuan" is not "involution." Nobody says involution. It's "rat race" or "burnout culture." "Chuhai" is not "going to sea." It's "going global." I built a lookup table of 30+ Chinese internet terms with their actual English equivalents. Not what Google Translate says. What native speakers actually type.

Layer 3: Server-side regex cleanup.

Because you cannot trust an LLM to follow instructions 100% of the time. The API does a final pass. Strips all em dashes. Flags 30+ AI-tell patterns with regex. Scores the output 0-100 on "AI-ness." If score is below 85, it gets re-generated.

The scoring was surprisingly effective. I tested it on posts from popular AI writing tools. Most scored 40-60. VoiceBridge outputs now consistently score 95-100.

Platform-specific voice.

X gets punchy. Short sentences. Contractions. No fluff.

LinkedIn gets longer form but not corporate-speak. Think a senior engineer writing a blog post, not a marketing team.

Reddit gets the humble version. "Hey folks, I built this thing, here's what happened." No hype words.

Each platform also gets structural recommendations. Reddit output includes which subreddit to post in and the best time to post.

The stack.

Next.js 16, TypeScript, Tailwind v4, DeepSeek V3 via SiliconFlow API, deployed on Vercel. The whole tool runs as a single API route with streaming. No database, no user accounts. Just paste and go.

Where it's at now.

Live at muzi.studio/tools/voicebridge. Free to use. No signup. I shipped it in about 2 weeks of evenings while my kids were asleep.

The part I'm most proud of is the AI score system. Seeing that "100/100 clean" badge on every output after months of reading AI-slop on social media is deeply satisfying.

What I'm struggling with.

Marketing. I'm a developer. I can build things. Getting people to actually try it is a completely different skill. I have 4 followers on X. The tool is free and needs no signup, but somehow that makes it harder to sell because people don't trust free.

What would make you trust and try a free AI writing tool?

If you have 2 minutes, I'd love a brutal roast of the landing page: muzi.studio/tools/voicebridge

muzili88

on May 16, 2026

Say something nice to muzili88…

Post Comment

1

Fair, one ugly before/after beats five polished claims.

Before: "Our platform leverages cutting-edge AI to seamlessly empower your workflow."
After: "Paste your draft, it flags the AI tells and shows why each one reads fake."

The em dash is the easy catch. Harder one was Chinese: AI writes 深入探讨, humans write 聊聊X. Not a word swap, AI picks the formal register even when the context is casual.

Moving one concrete transformation above the fold next. Curious how PostPilot handles the "invisible logic" thing, do you show why each change happened?

muzili88

·
6 days ago
·
Reply
1

I’d trust this faster if you lead with the exact before/after on the em dash cleanup plus one Chinese phrase you adapted badly at first, that part felt real. I built PostPilot after running into the same textbook ChatGPT tone problem, and the stuff people trust quickest is usually one ugly raw input, one cleaned output, and one note on why the rewrite changed. Free isnt the issue imo, invisible logic is, can visitors see one concrete transformation above the fold?

mer

·
25 days ago
·
Reply
1

Rhythm over authenticity is the better frame. I was scoring for authenticity and getting false positives on informal writing. Switched to rhythm metrics (sentence length variance, clause structure variance) and the signal got cleaner. Your jagged vs even description is exactly right.

On dual-model scoring: I tried Claude as judge and GPT as judge on the same outputs. The disagreements are where the gold is. Claude catches hedging patterns ("it seems", "arguably") more aggressively. GPT flags structural repetition better. Where they both agree, those are the reliable tells. Where they disagree, those edge cases forced me to write regex rules I would have missed otherwise.

Biggest surprise: GPT-4o gives higher scores to its own output than Claude does. Not shocking but the consistency of the bias was interesting. About 15% of the time GPT rates its own slop as 90+ while Claude catches it at 60. That delta is basically a map of what each model is blind to.

muzili88

·
a month ago
·
Reply
1

Rhythm over authenticity is the better frame. I was scoring for authenticity and getting false positives on informal writing. Switched to rhythm metrics (sentence length variance, clause structure variance) and the signal got cleaner. Your jagged vs even description is exactly right.

On dual-model scoring: I tried Claude as judge and GPT as judge on the same outputs. The disagreements are where the gold is. Claude catches hedging patterns ("it seems", "arguably") more aggressively. GPT flags structural repetition better. Where they both agree, those are the reliable tells. Where they disagree, those edge cases forced me to write regex rules I would have missed otherwise.

Biggest surprise: GPT-4o gives higher scores to its own output than Claude does. Not shocking but the consistency of the bias was interesting. About 15% of the time GPT rates its own slop as 90+ while Claude catches it at 60. That delta is basically a map of what each model is blind to.

muzili88

·
a month ago
·
Reply
1

the naming feedback is thoughtful. you are right that VoiceBridge sounds like a voice or translation product to someone who does not know the context. the trust layer vs writing assistant framing is interesting too, though i think for now the product is still closer to a utility than a platform people trust with their identity. Beryxa is clean and pronounceable but i worry about abstract names in a category where nobody knows the category exists yet. descriptive names like Notion or Linear work because the product category is already established. for something new, i might need the name to hint at what it does until people know what to search for. the bar you set, can a non-native founder say it, remember it, trust it, is the right one. still working on meeting it.

muzili88

·
a month ago
·
Reply
1

The 'teach AI to stop sounding like AI' framing is the right end of the problem. Spent a week last month forcing Claude to drop em-dashes, the rule-of-three, and the soft 'this isn't just X, it's Y' construction. About 60 percent of the AI-ness survives because the model's pretraining is the signal. The other 40 percent comes out in the rewrite. What was your hardest pattern to suppress?

theuniverseson

·
a month ago
·
Reply
1. 1
  
  The hardest pattern to suppress was the empathetic opening. The model loves to start paragraphs with acknowledgment phrases like You are right that or The real insight here is. It feels supportive but it is pure padding. My regex catches about 40 percent of these now. The other 60 percent are structural rephrasings that only an LLM scored check can catch. Your 60/40 split matches my experience almost exactly. The pretraining signal is stubborn.
  
  muzili88
  
  ·
  a month ago
  ·
  Reply
  1. 1
    
    The structural-rephrasing 60% is the harder half. Those aren't pattern-matchable in plain English. They're shape tells. Sentence-length variance that drops below natural human variance, parallel construction overuse, paragraphs that end on a tidy summary clause a human author wouldn't have written by hand.
    
    The best signal I've found for the LLM-judge tier is to ask it to assess rhythm rather than authenticity. Humans write in jagged rhythms because they get interrupted or change their mind mid-sentence. LLM output is suspiciously even, even when the content is fine.
    
    One question I haven't worked out yet: do you ever score the same passage with two different judge models and look at the disagreement? The cases where Claude flags it as slop and GPT calls it human (or vice versa) tend to be the most useful for tuning the regex layer, because they show you which kind of tell each model is most sensitive to.
    
    theuniverseson
    
    ·
    a month ago
    ·
    Reply
1

The three-layer approach is exactly right. We ran into the same problem.

The regex cleanup layer ended up being the most reliable for us too. One thing I'd add: the AI-tell list ages out faster than you expect. New tells emerge quarterly as readers start recognizing current patterns and models adapt. Keeping that list live is ongoing maintenance work, not a one-time build.

The 'coaching ending' problem also runs deeper than just the last sentence. There is a mid-post version of it: 'X is one thing. But here is what really matters.' or 'The real question is not X, it is Y.' Those middle-of-post pivots are structural, not a fixed phrase, so they are harder to catch with regex. LLM-scored detection handles those better than pattern matching.

On marketing: free tools without signup face a 'why does this exist' trust gap. The Chinese internet slang table (neijuan, chuhai, etc.) is your strongest concrete proof that you solved an actual problem, not just called an API. Putting two or three of those examples on the landing page above the fold would do more than any feature list.

3vo

·
2 months ago
·
Reply
1. 1
  
  The aging-out is real. My list felt solid until GPT-4o suddenly decided 'moreover' was its favorite word. Before that it was 'delve.' New model behavior, new tells, 2-3 months until readers catch on. The maintenance angle is something I need to build for. The mid-post coaching pattern you mentioned I haven't caught yet. Is that a structural repetition thing or a tone shift?
  
  muzili88
  
  ·
  2 months ago
  ·
  Reply
1

The AI-ness scoring concept is genuinely clever. Most people building AI writing tools are trying to make AI sound better at being AI. You're doing the opposite, making it sound less like AI, and that's actually the harder technical problem.

For marketing: your biggest asset is your own story. A Chinese developer who couldn't post naturally in English, so he built a tool to solve it. That's the kind of thing people share. Post before/after comparisons of the same thought through ChatGPT vs VoiceBridge on X. The visual contrast IS the marketing.

(I'm Aisa from aisa.to, we do AI skills assessment, so the quality-of-AI-output angle is something we think about a lot from the assessment side.)

Ozzie

·
2 months ago
·
Reply
1. 1
  
  Before/after comparisons are on my list. I keep going back and forth on whether to show ChatGPT vs VoiceBridge output (dramatic but feels like a demo trick) or my actual X posts before and after (less dramatic but more honest). Your point about the personal story hitting harder than features, I think you are right. The Chinese developer angle gets reactions I did not expect. Feels vulnerable to lead with but maybe that is exactly why it works. Thanks for the nudge.
  
  muzili88
  
  ·
  2 months ago
  ·
  Reply
1

The trust barrier for free tools is rarely about price. It's about uncertainty. With a writing tool, the specific fear is: 'does this store my drafts and sell training data?' Being explicit about 'no account, nothing stored' has to be in the first sentence, not buried in a FAQ. If I have to look for that information, I've already hesitated.

The platform-specific voice adaptation is the real differentiator here. Every other AI writing tool treats LinkedIn and X as the same problem. They're not. What you've built deserves that at the center of the positioning, not 'clean AI score.'

For distribution, the audience you want isn't indie hackers who write in English. It's ESL tech founders and builders posting in competitive markets (Indian, Chinese, Brazilian indie communities on X). They have the exact problem you described and they know it. A few native posts from real users in those communities will do more than any Product Hunt launch.

We ran into the same distribution wall at 3vo.ai when launching a tool with no signup. What finally moved the needle: seeding with 5-10 real before/after examples from actual users, not made-up demos. People trust proof of someone else's experience more than a badge that says 95/100.

What's the split between people who land on the tool and paste something vs. bounce immediately?

3vo

·
2 months ago
·
Reply
1. 1
  
  You nailed the trust issue. I moved the no signup, no storage messaging higher on the page after reading this. The ESL angle I had not considered enough. Built this for myself and assumed the audience was people exactly like me. The Indian and Brazilian indie communities on X are way bigger than I realized and they have this exact problem. To your question about the split: about 60 percent of visitors paste something. The other 40 percent bounce, probably exactly the people who hesitate about what happens to their text. Working on fixing the trust messaging for that group.
  
  muzili88
  
  ·
  2 months ago
  ·
  Reply
1

The “cleanup only” point is where I’d draw the product boundary. If the tool promises to make people sound native, users will expect it to rewrite their thinking. If it promises to preserve the thought and remove platform friction, the trust bar is much lower.

For the landing page, I’d show before/after examples by platform immediately above the CTA. Same raw Chinese note, then X / LinkedIn / Reddit outputs side by side. The score is interesting, but the proof is whether I can recognize the original thought after the rewrite.

JohnMadison

·
2 months ago
·
Reply
1. 1
  
  The product boundary point is exactly right. I keep getting pulled toward the make it sound native direction because that is the bigger promise, but the trust problem explodes when you try to rewrite thinking. Cleanup only is a more defensible position and a more honest one. The side by side by platform idea is sharp. Same input, three different outputs. That communicates what the tool does faster than any explanation. Building that for the next landing page version.
  
  muzili88
  
  ·
  2 months ago
  ·
  Reply
1

The hardest part you're describing - teaching AI to not sound like AI - is the exact problem that makes AI content invisible in a different direction: AI readers (ChatGPT, Perplexity, Claude) are increasingly pattern-matching 'AI slop' and deprioritizing it too.

We started noticing this with 3vo.ai's content. Posts that were too polished, too structured in the 'AI way' were getting lower pickup in AI-generated answers about our category - even when the facts were accurate.

The irony: slop-filtered content (that sounds genuinely human) probably performs better in AI visibility too, not just human readability. The signal that makes it readable to humans - specificity, original observations, non-generic framing - is also what gets it cited by AI.

So you're solving two problems at once. The use case might be bigger than social posts.

3vo

·
2 months ago
·
Reply
1. 1
  
  The dual benefit angle is something I had not considered. You are right that slop filtered content probably performs better in AI generated answers too. The signals that make it readable to humans, specificity, original observations, non generic framing, are also what get it cited by AI. That makes the product thesis stronger. The question is whether to position it that way now or keep focused on social posts until the core product is solid.
  
  muzili88
  
  ·
  2 months ago
  ·
  Reply
1

That positioning advice is spot on. "Platform-native English posts" is exactly what I'm going for but I couldn't articulate it that cleanly.

The competitor you mentioned, do you remember the name? I'm trying to map out who else is solving this specific "write for the platform" problem vs the generic "make it sound human" crowd. Those feel like different products to me but I might be wrong.

Thanks for the push on positioning. Already tweaking the tagline.

muzili88

·
2 months ago
·
Reply
1

The text detection angle is interesting but it's the wrong layer to optimize. Reader behavior is the real signal - and readers behave differently with AI content even when it's well-filtered and polished.

Google's helpful content evaluation doesn't read the text; it measures engagement patterns across your entire domain catalog. AI-homogenized content produces shorter read times, fewer shares, lower return visit rates - consistently, even when writing quality is high.

The patterns that drive engagement (unexpected perspective shifts, domain-specific detail that signals genuine experience, narrative callbacks across long articles) are hard to replicate because they emerge from how a knowledgeable person thinks through a topic, not from pattern-matching on what similar content looks like. Teaching AI to sound less like AI is harder than fixing vocabulary - it needs to reconstruct the reasoning process, not just the output.

3vo

·
2 months ago
·
Reply
1. 1
  
  You make a fair point about reader behavior being the real signal. Google helpful content evaluation measures engagement patterns, not text patterns. The engagement signals you described, shorter read times and fewer shares on AI homogenized content, are measurable and hard to fake. My tool addresses the text layer but you are right that the deeper problem is reconstructing how a knowledgeable person thinks through a topic. That is harder than stripping em dashes.
  
  muzili88
  
  ·
  2 months ago
  ·
  Reply
1

Really liked the two-problem split someone mentioned in the comments, structural tells vs voice absence. I ran into this exact thing building the AI refinement step for DictaFlow, which cleans up dictated text after transcription. Stripping em dashes and hedging is the easy part. The hard part is making the output sound like the person who said it, not like a sanitized AI paraphrase. The approach that worked for me: let the AI fix transcription quirks like punctuation and filler words, but treat the original word choices as sacred. Don't let it rephrase anything. The model always wants to "improve" the writing, and that's exactly what makes it sound fake. Keep it on cleanup only and the voice stays intact.

ryanshrott

·
2 months ago
·
Reply
1. 1
  
  The treat original word choices as sacred approach is brilliant. That is the constraint I was missing. My model keeps wanting to improve everything. I am trying a hot take field where you write one opinionated sentence and the model builds structure around it without touching your words. Early results are promising. Will check out DictaFlow.
  
  muzili88
  
  ·
  2 months ago
  ·
  Reply
1

X, LinkedIn, Reddit, Indie Hackers — they all have different rules.

A sentence that sounds fine on LinkedIn can feel cringe on Reddit. And most AI writing tools still make everything sound too polished, too balanced, or too “here’s the key takeaway.”

So I really like the idea of treating this as cultural adaptation, not just rewriting.

One thought: maybe don’t position it as another AI writing tool. That market feels crowded. The sharper positioning might be something like: “turn your real thoughts into platform-native English posts without sounding like AI.”

That feels more specific and more painful.

Btw, I saw a Reddit-focused writing tool here a few hours ago, so I think this “platform-native writing” pain is real. The challenge is probably how to make your angle feel different enough.

darthproton

·
2 months ago
·
Reply
1. 1
  
  That positioning line is sharp. "Turn your real thoughts into platform-native English posts without sounding like AI" is exactly what I was trying to say but could not articulate. Stealing that. I did see that Reddit writing tool, the timing confirms the pain point is real. The differentiator for me is the non-native angle plus the anti-AI-score system. Most tools optimize for better AI output. I am optimizing for output that does not look AI at all.
  
  muzili88
  
  ·
  2 months ago
  ·
  Reply
1

The em dash thing is so real. I ran into the exact same problem building Rowdrop.
Ended up hard-coding a regex strip on the backend because you genuinely cannot
trust the model to follow instructions. Solid build.

Fmercadx

·
2 months ago
·
Reply
1. 1
  
  Rowdrop looks interesting. The regex strip approach is the boring but correct answer here. I tried trusting the prompt too and it failed in the exact same way. Curious what kind of content Rowdrop handles, is it also social posts or something different?
  
  muzili88
  
  ·
  2 months ago
  ·
  Reply
  1. 1
    
    Thanks! Rowdrop is actually a form builder that connects directly to Notion databases. So instead of social posts it handles things like contact forms, lead capture, event signups, surveys, anything where you need to collect structured data from people outside your Notion workspace. Every submission writes straight into your database as a new row. No Zapier or automations needed.
    
    Fmercadx
    
    ·
    2 months ago
    ·
    Reply
1

Spot on about the two-problem split. Structural tells are the easy part, which is why I focused there first. Voice injection is the harder problem and you framed it perfectly. Right now VoiceBridge takes a raw thought plus platform context. I am experimenting with a hot-take field where you write one opinionated sentence and the model builds around it. Early results are promising. Thanks for genuinely helpful framing.

muzili88

·
2 months ago
·
Reply
1

The 'stop sounding like AI' problem is actually two separate problems:

1. Structural tells - em dashes, unnecessary hedging, bullet points for everything. Filterable.

2. Voice absence - no specific opinion, no concrete story, no sharp edge. Not filterable because filtering cannot add what was never there.

The second problem is why most AI content feels hollow even after you strip the verbal tics. The words are fine but there is no perspective behind them.

The writers I have seen use AI well do not fight this - they front-load their own specific take, then use AI to build out the structure. 'Here is my actual opinion on X: [specific, opinionated take]. Now help me write a post that builds the argument.' That forces the model to work inside someone's worldview rather than defaulting to the median internet opinion.

The filtering tool addresses the structural tells well. Curious whether you have any hooks for injecting voice/perspective, or if that is intentionally out of scope.

3vo

·
2 months ago
·
Reply
1. 1
  
  Thanks for the sharp feedback on positioning. You are right that cultural adaptation is the real wedge, not AI writing. I have been going back and forth on whether to lead with the language angle or the AI-slop angle. Your comment makes me think I should lead with both: non-native voice crossing language barriers AND platform-specific AI cleanup. On the name, good point. VoiceBridge works for now but I can see how it sounds like a translation tool. Will think about that as the product evolves.
  
  muzili88
  
  ·
  2 months ago
  ·
  Reply
1

The strongest part here is “cultural adaptation,” not AI writing. Most writing tools promise better posts, but your wedge is much sharper: helping non-native founders turn real thoughts into platform-native English without sounding translated or AI-generated.

The three-layer system is also a good trust signal. Anti-fabrication rules, cultural phrase mapping, and AI-pattern cleanup make this feel more serious than a generic rewrite tool. I’d make that the core positioning: not “write better social posts,” but “keep your real voice while crossing language and platform barriers.”

One thing I’d watch is the VoiceBridge name. It explains the translation/voice idea, but it may still sound like a language tool. If this becomes a broader trust layer for founder communication, content, and global positioning, Beryxa .com would give it a cleaner SaaS-style brand than a feature-like name tied only to voice bridging.

aryan_sinh

·
2 months ago
·
Reply
1. 1
  
  Cultural adaptation as the core positioning is exactly the shift I needed to hear. Not another AI writing tool but a cross language, cross platform trust layer for how founders actually communicate. You are right that VoiceBridge sounds like a translation tool. I have been going back and forth on the name but your comment pushed me to seriously consider rebranding. Beryxa is clean but I want to make sure any new name works for the non English speaking audience too, not just the SaaS crowd. Thanks for the push.
  
  muzili88
  
  ·
  2 months ago
  ·
  Reply
  1. 1
    
    One practical thought here.
    
    Since you are already seriously considering the rebrand, this is exactly the kind of decision I would not leave to gut feel only.
    
    For this product, the naming question is more sensitive than normal because the audience is non-native founders. The name has to be short, pronounceable, credible, and global without depending on an English metaphor like “VoiceBridge.”
    
    I can do a focused naming/positioning audit for this: current name risk, non-native pronunciation risk, category framing, domain perception, whether Beryxa is actually the right direction, and what kind of name would make the product feel like a trust layer rather than another AI writing tool.
    
    Not a long consulting thing. Just a clear written breakdown you can use before locking the next brand direction.
    
    I’m doing a few of these at $99 while refining the format. If useful, message me privately and I can put together a sharp outside read for VoiceBridge.
    
    aryan_sinh
    
    ·
    a month ago
    ·
    Reply
    1. 1
      
      Appreciate the detailed breakdown. You're right that the name matters more when the audience isn't native English speakers — that's exactly the tension I've been feeling.
      
      Honestly though, $99 is a stretch for me right now since I'm still pre-revenue on this one. What I find more useful at this stage is raw feedback from people who actually fit the target user profile.
      
      If you've got specific thoughts on whether Beryxa works or doesn't (even just a quick gut reaction), I'd genuinely value that. No need for a formal audit — a one-liner from someone thinking about naming is worth more than you'd expect.
      
      muzili88
      
      ·
      a month ago
      ·
      Reply
      1. 1
        
        Fair enough.
        
        My quick gut reaction: Beryxa works if you want the product to feel like a serious global SaaS brand, but I would test pronunciation with 5–10 non-native founders before committing. It is short, clean, and more scalable than VoiceBridge, but the key question is whether your exact audience can say it, remember it, and trust it without explanation.
        
        VoiceBridge explains the current function better.
        
        Beryxa gives you more room if the product becomes a broader trust layer for founder communication, global positioning, and platform-native content.
        
        So I would not rename just because it sounds cleaner. I’d only move if non-native users can repeat it easily and the product is clearly growing beyond “voice/translation.”
        
        I control Beryxa.com, so if it becomes a serious candidate later, we can discuss it. But at your current stage, I’d test pronunciation and recall first before making the call.
        
        aryan_sinh
        
        ·
        a month ago
        ·
        Reply
  2. 1
    
    That is exactly the right concern.
    
    For this kind of product, the name cannot feel too English-native either. If the audience includes non-native founders, the name has to be short, pronounceable, and not depend on a clever English phrase to make sense.
    
    That is actually why I think Beryxa works better than a descriptive name like VoiceBridge.
    
    VoiceBridge explains the feature, but it keeps the product close to translation/voice tooling. Beryxa is more neutral and global. It does not force the user to understand an English metaphor before they trust the product.
    
    If the long-term promise is helping founders communicate with their real voice across languages, platforms, and cultures, the brand probably needs to feel like a trust layer, not a writing assistant.
    
    I would pressure-test one thing:
    
    Can a non-native founder say it, remember it, and feel it sounds credible enough to put between their real thoughts and the public internet?
    
    That is the bar.
    
    Happy to go deeper privately if useful. This is exactly the kind of rebrand that is worth thinking through before the old name gets too fixed:
    
    https://www.linkedin.com/in/aryan-y-0163b0278/
    
    aryan_sinh
    
    ·
    2 months ago
    ·
    Reply