Community
    • Login

    RegEx Is there a solution here?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    8 Posts 4 Posters 397 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Thomas 2020T
      Thomas 2020
      last edited by

      I have a blacklist similar to this:

      idiot(a|i|e)?
      This block the idiot, idiota, idioti, idiote words.
      But some users bypass this check by changing some letters with others
      Ex: idi0t4
      How can I make a regex list for block bypass attempts mimilar to this?

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • rinku singhR
        rinku singh
        last edited by

        find what : idiot.

        1 Reply Last reply Reply Quote 0
        • PeterJonesP
          PeterJones @Thomas 2020
          last edited by PeterJones

          @Pan-Jan ,

          Fighting blacklisted words is a difficult task. You have to consider all alternates, including letters, numerals, and other unicode symbols. There is no “magic” regex which will be able to recognize it for you.

          For a generic hint, I suggest using character classes in square brackets [aie], not the groups-with-alternations (a|i|e). There is no reason to store the result in a group. It is easier to type the class without all the | separators.

          Ignoring unicode (for now), a case-insensitive “idiot” regex might look something like (?i)[i|:!][d][i|:!][o0][t7][aie4]?, which says

          • case insensitive
          • things that look vaguely i-shaped
            • | here is used as a literal character, because it looks like a capital i, not because it’s being used for alternation… another good reason to avoid the group where | has special meaning,
          • things that look vaguely d-shaped
          • things that look vaguely i-shaped
          • things that look vaguely o-shaped
          • 0 or 1 things that look vaguely vowel-ending
          1 Reply Last reply Reply Quote 2
          • Thomas 2020T
            Thomas 2020
            last edited by

            But people have ideas.
            That should probably be enough.

            (?-i)[iіìílӏIƖІ][dԁDƊ][iіìílӏIƖІ][oοᴏOΟ0][tΤТƬ]
            
            PeterJonesP 1 Reply Last reply Reply Quote 0
            • PeterJonesP
              PeterJones @Thomas 2020
              last edited by PeterJones

              @Pan-Jan said in RegEx Is there a solution here?:

              But people have ideas.

              Indeed. I even had ideas, which I shared with you.

              (?ii)

              To me, it seems strange to turn off case-insensitive (with (?-i)), and then explicitly list both the lowercase and uppercase characters.

              If I were trying to accomplish this, but for some reason couldn’t do the filtering through a command line script instead of doing it manually for each word inside Notepad++, I would at least use a command-line script to help write each the individual regex: I’d have the script ask for a word, like idiot, and then it would do an internal lookup from each letter to the list of characters that I thought were similar. (For example, that mapping would be i => "[iіìílӏIƖІ]", d => "[dԁDƊ]", ... according to your similarity rules.)

              Good luck with this. Spam filters have been trying for years.

              1 Reply Last reply Reply Quote 1
              • Thomas 2020T
                Thomas 2020
                last edited by

                idiota
                ɪdıota
                iɗioti
                ƖDI0T4
                idiơte
                iԁioti
                
                

                There has to be a solution.

                Can this be so?
                [^ A-z1-9\n\r]|idiot

                1 Reply Last reply Reply Quote 0
                • Thomas 2020T
                  Thomas 2020
                  last edited by Thomas 2020

                  [^ A-z0-9\n\r]|idi[o0]t

                  1 Reply Last reply Reply Quote 0
                  • Alan KilbornA
                    Alan Kilborn
                    last edited by

                    I like the test word chosen here.

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors