• Login
Community
  • Login

RegEx Is there a solution here?

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
8 Posts 4 Posters 393 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • T
    Thomas 2020
    last edited by Aug 4, 2020, 5:59 PM

    I have a blacklist similar to this:

    idiot(a|i|e)?
    This block the idiot, idiota, idioti, idiote words.
    But some users bypass this check by changing some letters with others
    Ex: idi0t4
    How can I make a regex list for block bypass attempts mimilar to this?

    P 1 Reply Last reply Aug 4, 2020, 6:24 PM Reply Quote 0
    • R
      rinku singh
      last edited by Aug 4, 2020, 6:19 PM

      find what : idiot.

      1 Reply Last reply Reply Quote 0
      • P
        PeterJones @Thomas 2020
        last edited by PeterJones Aug 4, 2020, 6:25 PM Aug 4, 2020, 6:24 PM

        @Pan-Jan ,

        Fighting blacklisted words is a difficult task. You have to consider all alternates, including letters, numerals, and other unicode symbols. There is no “magic” regex which will be able to recognize it for you.

        For a generic hint, I suggest using character classes in square brackets [aie], not the groups-with-alternations (a|i|e). There is no reason to store the result in a group. It is easier to type the class without all the | separators.

        Ignoring unicode (for now), a case-insensitive “idiot” regex might look something like (?i)[i|:!][d][i|:!][o0][t7][aie4]?, which says

        • case insensitive
        • things that look vaguely i-shaped
          • | here is used as a literal character, because it looks like a capital i, not because it’s being used for alternation… another good reason to avoid the group where | has special meaning,
        • things that look vaguely d-shaped
        • things that look vaguely i-shaped
        • things that look vaguely o-shaped
        • 0 or 1 things that look vaguely vowel-ending
        1 Reply Last reply Reply Quote 2
        • T
          Thomas 2020
          last edited by Aug 4, 2020, 8:10 PM

          But people have ideas.
          That should probably be enough.

          (?-i)[iіìílӏIƖІ][dԁDƊ][iіìílӏIƖІ][oοᴏOΟ0][tΤТƬ]
          
          P 1 Reply Last reply Aug 4, 2020, 8:35 PM Reply Quote 0
          • P
            PeterJones @Thomas 2020
            last edited by PeterJones Aug 4, 2020, 8:36 PM Aug 4, 2020, 8:35 PM

            @Pan-Jan said in RegEx Is there a solution here?:

            But people have ideas.

            Indeed. I even had ideas, which I shared with you.

            (?ii)

            To me, it seems strange to turn off case-insensitive (with (?-i)), and then explicitly list both the lowercase and uppercase characters.

            If I were trying to accomplish this, but for some reason couldn’t do the filtering through a command line script instead of doing it manually for each word inside Notepad++, I would at least use a command-line script to help write each the individual regex: I’d have the script ask for a word, like idiot, and then it would do an internal lookup from each letter to the list of characters that I thought were similar. (For example, that mapping would be i => "[iіìílӏIƖІ]", d => "[dԁDƊ]", ... according to your similarity rules.)

            Good luck with this. Spam filters have been trying for years.

            1 Reply Last reply Reply Quote 1
            • T
              Thomas 2020
              last edited by Aug 5, 2020, 7:15 AM

              idiota
              ɪdıota
              iɗioti
              ƖDI0T4
              idiơte
              iԁioti
              
              

              There has to be a solution.

              Can this be so?
              [^ A-z1-9\n\r]|idiot

              1 Reply Last reply Reply Quote 0
              • T
                Thomas 2020
                last edited by Thomas 2020 Aug 5, 2020, 7:53 AM Aug 5, 2020, 7:51 AM

                [^ A-z0-9\n\r]|idi[o0]t

                1 Reply Last reply Reply Quote 0
                • A
                  Alan Kilborn
                  last edited by Aug 5, 2020, 11:58 AM

                  I like the test word chosen here.

                  1 Reply Last reply Reply Quote 0
                  4 out of 8
                  • First post
                    4/8
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors