Profanity filters should not be trusted 100% of the way, however, they can give a pretty good estimate depending on what you have trained them with.
That being said, the best solution to that problem is pairing the filters with user reports, like you say, adding a human intermediary but also utilising that result to keep on making the filter better.
ML can't (and most likely will never) achieve 100% certainty for detecting said profanities, but pair it with a human and it can get those extra points needed for near perfection.
I use something like it for my own content processing, but written in PHP.
What it does is to use "bad words" list as an array, then break the checked content into an array as well using punctuation and spaces. With that, we check each word (or part of it) for bad word in array + bad word in the array - 1 character. For exact matches, you get an error, for a word that had character removed, you get a warning. This gives you a pretty good way of getting an idea what's what. This is only a base for the more advanced tool, but good enough to get you started.
All can be done in a few loops and using native tools, so there is no overhead.
I've used swearer with some success. - https://github.com/joshbuddy/swearjar
Your granny rides massive throbbing purple meat rods in prison.
None of those words were profanities.
Profanity filters are clbuttically bad ideas. It takes a human to detect profanity.
Profanity filters should not be trusted 100% of the way, however, they can give a pretty good estimate depending on what you have trained them with.
That being said, the best solution to that problem is pairing the filters with user reports, like you say, adding a human intermediary but also utilising that result to keep on making the filter better.
ML can't (and most likely will never) achieve 100% certainty for detecting said profanities, but pair it with a human and it can get those extra points needed for near perfection.
I played around with hate speech recognition a few weeks ago and found this: https://github.com/tpawelski/hate-speech-detection
Probably a sledgehammer to crack a nut but maybe it helps.
I have seen this gem used in stuff: https://github.com/chrisvfritz/language_filter
I use something like it for my own content processing, but written in PHP.
What it does is to use "bad words" list as an array, then break the checked content into an array as well using punctuation and spaces. With that, we check each word (or part of it) for bad word in array + bad word in the array - 1 character. For exact matches, you get an error, for a word that had character removed, you get a warning. This gives you a pretty good way of getting an idea what's what. This is only a base for the more advanced tool, but good enough to get you started.
All can be done in a few loops and using native tools, so there is no overhead.
Interesting, I don't use anything like that for IH.
This comment was deleted 4 years ago.
This comment was deleted 4 years ago.