Community
    • Login

    RegEx Is there a solution here?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    8 Posts 4 Posters 931 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Thomas 2020T Offline
      Thomas 2020
      last edited by

      I have a blacklist similar to this:

      idiot(a|i|e)?
      This block the idiot, idiota, idioti, idiote words.
      But some users bypass this check by changing some letters with others
      Ex: idi0t4
      How can I make a regex list for block bypass attempts mimilar to this?

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • rinku singhR Offline
        rinku singh
        last edited by

        find what : idiot.

        1 Reply Last reply Reply Quote 0
        • PeterJonesP Online
          PeterJones @Thomas 2020
          last edited by PeterJones

          @Pan-Jan ,

          Fighting blacklisted words is a difficult task. You have to consider all alternates, including letters, numerals, and other unicode symbols. There is no “magic” regex which will be able to recognize it for you.

          For a generic hint, I suggest using character classes in square brackets [aie], not the groups-with-alternations (a|i|e). There is no reason to store the result in a group. It is easier to type the class without all the | separators.

          Ignoring unicode (for now), a case-insensitive “idiot” regex might look something like (?i)[i|:!][d][i|:!][o0][t7][aie4]?, which says

          • case insensitive
          • things that look vaguely i-shaped
            • | here is used as a literal character, because it looks like a capital i, not because it’s being used for alternation… another good reason to avoid the group where | has special meaning,
          • things that look vaguely d-shaped
          • things that look vaguely i-shaped
          • things that look vaguely o-shaped
          • 0 or 1 things that look vaguely vowel-ending
          1 Reply Last reply Reply Quote 2
          • Thomas 2020T Offline
            Thomas 2020
            last edited by

            But people have ideas.
            That should probably be enough.

            (?-i)[iіìílӏIƖІ][dԁDƊ][iіìílӏIƖІ][oοᴏOΟ0][tΤТƬ]
            
            PeterJonesP 1 Reply Last reply Reply Quote 0
            • PeterJonesP Online
              PeterJones @Thomas 2020
              last edited by PeterJones

              @Pan-Jan said in RegEx Is there a solution here?:

              But people have ideas.

              Indeed. I even had ideas, which I shared with you.

              (?ii)

              To me, it seems strange to turn off case-insensitive (with (?-i)), and then explicitly list both the lowercase and uppercase characters.

              If I were trying to accomplish this, but for some reason couldn’t do the filtering through a command line script instead of doing it manually for each word inside Notepad++, I would at least use a command-line script to help write each the individual regex: I’d have the script ask for a word, like idiot, and then it would do an internal lookup from each letter to the list of characters that I thought were similar. (For example, that mapping would be i => "[iіìílӏIƖІ]", d => "[dԁDƊ]", ... according to your similarity rules.)

              Good luck with this. Spam filters have been trying for years.

              1 Reply Last reply Reply Quote 1
              • Thomas 2020T Offline
                Thomas 2020
                last edited by

                idiota
                ɪdıota
                iɗioti
                ƖDI0T4
                idiơte
                iԁioti
                
                

                There has to be a solution.

                Can this be so?
                [^ A-z1-9\n\r]|idiot

                1 Reply Last reply Reply Quote 0
                • Thomas 2020T Offline
                  Thomas 2020
                  last edited by Thomas 2020

                  [^ A-z0-9\n\r]|idi[o0]t

                  1 Reply Last reply Reply Quote 0
                  • Alan KilbornA Offline
                    Alan Kilborn
                    last edited by

                    I like the test word chosen here.

                    1 Reply Last reply Reply Quote 0

                    Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                    Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                    With your input, this post could be even better 💗

                    Register Login
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors