RegEx Is there a solution here?
-
I have a blacklist similar to this:
idiot(a|i|e)?
This block the idiot, idiota, idioti, idiote words.
But some users bypass this check by changing some letters with others
Ex: idi0t4
How can I make a regex list for block bypass attempts mimilar to this? -
find what :
idiot. -
@Pan-Jan ,
Fighting blacklisted words is a difficult task. You have to consider all alternates, including letters, numerals, and other unicode symbols. There is no “magic” regex which will be able to recognize it for you.
For a generic hint, I suggest using character classes in square brackets
[aie], not the groups-with-alternations(a|i|e). There is no reason to store the result in a group. It is easier to type the class without all the|separators.Ignoring unicode (for now), a case-insensitive “idiot” regex might look something like
(?i)[i|:!][d][i|:!][o0][t7][aie4]?, which says- case insensitive
- things that look vaguely i-shaped
|here is used as a literal character, because it looks like a capitali, not because it’s being used for alternation… another good reason to avoid the group where|has special meaning,
- things that look vaguely d-shaped
- things that look vaguely i-shaped
- things that look vaguely o-shaped
- 0 or 1 things that look vaguely vowel-ending
-
But people have ideas.
That should probably be enough.(?-i)[iіìílӏIƖІ][dԁDƊ][iіìílӏIƖІ][oοᴏOΟ0][tΤТƬ] -
@Pan-Jan said in RegEx Is there a solution here?:
But people have ideas.
Indeed. I even had ideas, which I shared with you.
(?ii)To me, it seems strange to turn off case-insensitive (with
(?-i)), and then explicitly list both the lowercase and uppercase characters.If I were trying to accomplish this, but for some reason couldn’t do the filtering through a command line script instead of doing it manually for each word inside Notepad++, I would at least use a command-line script to help write each the individual regex: I’d have the script ask for a word, like
idiot, and then it would do an internal lookup from each letter to the list of characters that I thought were similar. (For example, that mapping would bei => "[iіìílӏIƖІ]", d => "[dԁDƊ]", ...according to your similarity rules.)Good luck with this. Spam filters have been trying for years.
-
idiota ɪdıota iɗioti ƖDI0T4 idiơte iԁiotiThere has to be a solution.
Can this be so?
[^ A-z1-9\n\r]|idiot -
[^ A-z0-9\n\r]|idi[o0]t -
I like the test word chosen here.
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login