What do you use to protect against bots?

by rab

My use case is authentication.

Assume throttling etc on server but looking to limit hits in particular for email login / form (aka magic link) as concerned random emails could be triggered just for fun.

Do you use Cloudflare, Google captcha?
Is there a credible indie or open source solution?

Thanks

rab

posted to

Developers

on February 19, 2021

Say something nice to rab…

Post Comment

4
There's a lot of things you can do, but realistically, for starters, I would suggest against just having an "enter email" for a magic link. These can break, except when it actually comes to real usage. There's a write-up from not that long ago about not using magic links.

Additionally, Captcha's are a bit dumb, and they are a terrible user experience, so I general avoid using those.

Cloudflare, or really any CDN is great for this, most offer something like a WAF, to block attackers making requests. This is additionally fantastic because this protects your whole API not just one url, since hidden fields suck at that. Additionally, hidden fields don't get you anywhere, because if I use web automation, I can just pull the value out from the web form before making the submission, so you aren't blocking anything.

There are two additional strategies for this:
- Use an IAM as a service provider, we offer one and have things like WAF as well as other automated protections in place to prevent service abuse.
- Just don't worry about this problem, sure it's the same as saying "Have you tried not having this problem". But what's the problem with allowing bots to request things, if it is really a problem, I would look into preventing the "why". Why is this an issue for you?
Most of the time the solution is actually "Well it's a good a thing to do right?" And in truth, sure, but it's also really expensive to implement. So what value are you getting out of that. Most services don't actually benefit from this, and with any amount of traffic + scale, this is going to be so much lower than real users usage.
wparad

·
5 years ago
·
Reply
1. 1
  
  #7 from your link captures why I asked. Other points not so much. Agree with your "is it really a problem worth addressing now" comment. Thanks.
  
  rab
  
  ·
  5 years ago
  ·
  Reply
2

Cloudflare + a honeypot field such us:

<input type="hidden" name="type_here_please" value="" aria-hidden="true">

This should limit bot submissions etc.

ThePeterMick

·
5 years ago
·
Reply
1. 3
  
  I have something similar as my honeypot - worth noting it's important to put in the relevant accessibility or else anyone using a screen reader gets labelled as a bot.
  
  ryanGlass
  
  ·
  5 years ago
  ·
  Reply
  1. 2
    
    Good point Ryan - I've updated the snippet.
    
    ThePeterMick
    
    ·
    5 years ago
    ·
    Reply
1

I would recommend three things:

Cloudfare
Use of Google reCaptcha
Use of HTTPS

LisaMay

·
5 years ago
·
Reply
1

Reposting from a previous thread

I had the same problem when I launched my previous SaaS: automated signups from what seemed like stolen emails originating from residential IP addresses (probably breached IoT devices and whatnot).

I hate Google's captcha, so I wanted to try something different first.

I ended up using a Ruby gem called invisible captcha, which uses heuristics such as honeypot fields and time-sensitive submissions.

Roughly speaking, if someone (1) fills an invisible form field (with a random name so that it won't be populated by password managers) OR (2) submits a form too quickly (let's say within 4 seconds of opening a page), they're probably a bot, and their input should be ignored. You can optionally inform folks to retry the request if they submitted it too fast.

It was working great - not a single bogus signup after I implemented it. It won't fly if bots are using headless browsers, but most bots (and their operators) aren't sophisticated enough to pull that off.

If your language doesn't have a similar library, it won't be that hard to write a middleware replicating this functionality.

jmstfv

·
5 years ago
·
Reply
1. 1
  
  Cool. Very useful to hear your positive experience and the solution seems straightforward. Thanks.
  
  rab
  
  ·
  5 years ago
  ·
  Reply
1

We use Hcaptcha. Free and works well.

trhodes

·
5 years ago
·
Reply
1. 1
  
  Looks the business + what a great idea! Thanks.
  
  rab
  
  ·
  5 years ago
  ·
  Reply
1

Here's my story: https://www.indiehackers.com/post/whats-your-anti-spam-playbook-ff3b94468f

Months later, I can say the problem is under control. All tools I've used for fighting spam are open source.

alchemist

·
5 years ago
·
Reply
1. 2
  
  Useful tactics. Thanks for sharing.
  
  rab
  
  ·
  5 years ago
  ·
  Reply
1

I use CloudFlare and Google ReCaptcha for my projects/websites, seems to do the trick for me.

boardy

·
5 years ago
·
Reply
1. 1
  
  Solid choices. Thanks.
  
  rab
  
  ·
  5 years ago
  ·
  Reply
1

Honeypot + throttling.

For a magic link, you could also throttle the emails.

kylegawley

·
5 years ago
·
Reply
1. 1
  
  Solid enough for my use case I would think. Thanks.
  
  rab
  
  ·
  5 years ago
  ·
  Reply
1
Using firebase and 🙈 for now
On other projects depending on the severity of issues I :
- Honeypot
- Block all major server networks (AWS was like 99% of bots)
- CrCF was it? Unique Id a request..
- Block silly useragents like curl, python and anything with bot in it, Yea these legit happen...
- Specific lookups and blocks
hatkyinc

·
5 years ago
·
Reply
1. 1
  
  Can you expand on each of these. How are you honeypotting? What technique are you using to block server networks?
  
  Jasondigitized
  
  ·
  5 years ago
  ·
  Reply
  1. 1
    honeypotting - https://www.projecthoneypot.org/ in full platforms like wp, you'd find plugins you can use that are ready-made, but otherwise it's just putting a few links around and importing the block list, there are instructions for pushing it to the HTTP servers like nginx in place of doing it in app..
    
    Block all major server networks - just download some server lists, do mind it if your after google indexing for example not to exclude that, I don't recall where exactly I got good lists but some random example https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html https://gist.github.com/n0531m/f3714f6ad6ef738a3b0a
    again better to push it to nginx or something
    Also note this might block some VPN/proxy services
    
    csrf - more involved if you need to build your own, if it's a full system you might find a made solution or a lib, but basically it's generating a unique id for the form and expecting it back on submit, just an HTML 'hidden' field that's changing
    
    Block silly user agents - https://www.cyberciti.biz/faq/unix-linux-appleosx-bsd-nginx-block-user-agent/
    can try this https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker
    Also note not to block legit crawlers you might want like search index bots, you might want to whitelist them like google,mns/bing/whatev they call it today, Yandex maybe...
    You can scan you analytics for them...
    
    Specific lookups and blocks - it possibly won't show in 'normal' analytics but it nginx server logs and aggregators you can see who's pulling dispropotional amount of calls by different slices and overtime get efficent at this with ready-made queries/alerts..
    
    hatkyinc
    
    ·
    5 years ago
    ·
    Reply
2. 1
  
  Ha. 🙈 works for me. csrf probably. Can see it becomes involved. Going to look to hand-it off as much as a I can. Think I see a way for the email magic-link option. Thanks.
  
  rab
  
  ·
  5 years ago
  ·
  Reply
1

I use a self-built honeypot, Google Invisible Recaptcha and email verification for sign-ups but although this stops most bots, some clever bots will get past all of these.

I have tried Cloudflare for a client whose site was getting hammered because he didn't moderate comments on his blog. It was useless and slowed the site to a crawl - something like 15 seconds to show a page. Cloudflare also do really weird stuff like sending the massive HEAD responses of megabytes rather than a few bytes. I ended up finding a different solution.

I had a load of bots from users at glitch.me signing up to Downtime Monkey to ping their sites and keep their free servers online all the time (they usually spin up only when in use). After battling this for a week I ended up blacklisting glitch users and auto-deleting accounts when someone tried to set up monitor for glitch.

What's quite amusing is that the bots that are clever enough to get past all the recaptchas aren't clever enough to stop trying when they hit air. Months later I still get a few attempts each day.

ryanGlass

·
5 years ago
·
Reply
1. 1
  
  Interesting. Accept it's impossible to stop all. Thanks for input.
  
  rab
  
  ·
  5 years ago
  ·
  Reply
1

Following 🙋‍♀️

Google captcha at this point, still in testing phase. Still a bit shocked about how captcha works in terms of user privacy. Haven´t made my mind up yet about the ethical part of it.. I´ve been hearing some great things about Cloudfare though.

maevaeverywhere

·
5 years ago
·
Reply
1. 2
  
  Hey. Yeah doesn't surprise me. I don't know internals of Captcha but all Google freebies are there to track, even those wonderful fonts we all love to use. Check FriendlyCaptcha in this post, although one reply seems to be challenging it.
  
  rab
  
  ·
  5 years ago
  ·
  Reply
1

I suppose this is the time to plug the product I helped build: Friendly Captcha as the privacy-friendly alternative (no cookies, no tracking, it works a bit differently).

It's open core (i.e. the SaaS around it is not open source, but the building blocks are open source, as is the widget/code you would put into your website).

Guido

·
5 years ago
·
Reply
1. 1
  
  Absolutely the time to plug. Looks great! Really good. Will play with it to check but on first viewing looks like just the kind of thing I was looking for.
  
  rab
  
  ·
  5 years ago
  ·
  Reply
2. 1
  
  Seems to be similar to geetest.com.
  
  During scraping websites with this protection we just triggered a lambda function that executed that crypto puzzle 🤷
  
  roman_zaiats
  
  ·
  5 years ago
  ·
  Reply
  1. 1
    
    I think that's fair :)
    
    We work hard to make sure it is as effective as possible while not compromising on privacy+accessibility, but no captcha will keep out all spammers/scrapers.
    
    You can buy thousands of solves for $1 for normal captchas, or spend time+resources solving crypto puzzles. There is no perfect captcha that will protect against everything reliably. Most people use captchas against untargeted abuse (e.g. scripts submitting to any internet form with some adult ad text/email), not targeted attacks from those who are willing to spend actual money/time (I would argue a reasonable amount of automated scraping is not an attack anyway).
    
    We have some more advanced stuff running in the backend too: we adapt the difficulty of the crypto puzzle based on some signals (a straightforward one being if you made many requests recently it gets more difficult).
    
    Guido
    
    ·
    5 years ago
    ·
    Reply
    1. 1
      
      Yup, I get it, there is no solution that would prevent bots from scraping a website. Tho this is a really interesting field to work with.
      
      Another popular approach to detect bots is analyzing browser fingerprints. AFAIR distil networks provide some decent bot detection solutions
      
      roman_zaiats
      
      ·
      5 years ago
      ·
      Reply
1

I've used a captcha on a Rails site to protect a form submission. I guess that wouldn't work for a magic link though.

pooria

·
5 years ago
·
Reply
1. 1
  
  Hi. Could use on the form to avoid malicious triggering of email send. So yeah relevant. Have updated post to be clearer. Was it the Google captcha you used. Thanks
  
  rab
  
  ·
  5 years ago
  ·
  Reply
1

This comment was deleted 3 years ago.

DeletedUser

·
5 years ago
1. 1
  
  That's sneaky. Remind me never to play you at poker. Thanks!
  
  rab
  
  ·
  5 years ago
  ·
  Reply
1

This comment was deleted 3 years ago.

DeletedUser

·
5 years ago
1. 1
  
  load forms just by javascript
  Best case is to show them as a modal
  
  Neat ideas! Thanks.
  
  rab
  
  ·
  5 years ago
  ·
  Reply