Community
    • Login

    Pythonscript search different than N++ search when using \< and leading uppercase

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    25 Posts 5 Posters 13.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • MAPJe71M
      MAPJe71
      last edited by MAPJe71

      Python 2.7.x does not interpret \< and \> as word boundaries.
      Verified it with RegexBuddy by changing the Application (Regex Flavor) from boost::regex 1.58-1.59 to python 2.7.

      Claudia FrankC 1 Reply Last reply Reply Quote 0
      • Claudia FrankC
        Claudia Frank @MAPJe71
        last edited by

        @MAPJe71

        yes, but research is not using python engine it is using boost engine, afaik.

        Cheers
        Claudia

        1 Reply Last reply Reply Quote 0
        • Claudia FrankC
          Claudia Frank
          last edited by

          quick update
          I assume I found the difference and some glitch (depends on view).

          If searching by npp find/reaplce dialog the standard is ignorecase.
          Flag matchcase is only set if checkbox is checked (makes sense)

          If searching via pythonscript and research than standard is matchcase.
          Only if flag re.I is provided it does ignoring the case (again, makes sense)

          Our assumption was that regardless what has been selected, the usage of
          (?i) does mean that it will treat it as ignorecase and this is not the case.

          If you do a search via the npp dialog using matchcase checkbox and (?i)
          you will see that it uses the matchcase instead of ignorecase.

          So - at the moment I would say, both behave the same.

          I will do some further tests later.

          Cheers
          Claudia

          Scott SumnerS 1 Reply Last reply Reply Quote 0
          • Scott SumnerS
            Scott Sumner @Claudia Frank
            last edited by Scott Sumner

            @Claudia-Frank said:

            If you do a search via the npp dialog using matchcase checkbox and (?i)
            you will see that it uses the matchcase instead of ignorecase.

            Okay, so now I’m confused. If I try a very simple interactive search (forget the \< thing discussed earlier), where I lead off the Find-what box with (?i), it makes no difference whether the “match case” checkbox is checked or not, my simple search gets matches that ignore case. @Claudia-Frank , doesn’t this result go against what you just said?

            Claudia FrankC 1 Reply Last reply Reply Quote 0
            • Claudia FrankC
              Claudia Frank @Scott Sumner
              last edited by

              @Scott-Sumner

              No, if you take into account that research is resulting the same but
              yes if you took that statement in general.
              I’m comparing npp dialog search and python script research results.

              But you are right, confusing!

              From testing it seems that research and npp search behave the same
              if flags are set the same.

              Cheers
              Claudia

              Alan KilbornA 1 Reply Last reply Reply Quote 0
              • Alan KilbornA
                Alan Kilborn @Claudia Frank
                last edited by

                @Claudia-Frank , do you still have more testing to do, or have you taken it as far as it can go? If you are done, then I have questions. :)

                The main question is, should the golden rule be, when writing Pythonscript, to forget about embedding (?i) at the front of a regular expession string, and setting the flags parameter to re.IGNORECASE (or re.I, or even 2) to get the behavior that is desired?

                I always liked using (?i) because that way, when reading a regex from left to right, I could prepare myself properly, rather than having to look for a somewhat disconnected flags parameter. But, if things aren’t going to work right, I can learn another way.

                However, I’m still confused about what I should do to make things work right in all cases… :(

                Claudia FrankC 1 Reply Last reply Reply Quote 0
                • Claudia FrankC
                  Claudia Frank @Alan Kilborn
                  last edited by

                  @Alan-Kilborn ,
                  at the moment, it looks like my last statement about the flags and if set the same seems to be correct.
                  But this could mean I just haven’t found the regex which breaks it again or it is simply true.
                  I like to tend to the latter :-)

                  Does it matter in regards to your script?
                  I assume, only if you want to compare with the interactive search dialog or
                  when creating the regex with the dialog and afterwards using it in a script.

                  If my statement about the flags is true, and you want to have the script behave exactly the same
                  as npps find dialog does, then you need to provide the flags as npp does.

                  I do see the advantage of using the modifiers but it looks like that neither pythonscript
                  nor npp do it right in terms of the issue you found.

                  Cheers
                  Claudia

                  Alan KilbornA 1 Reply Last reply Reply Quote 0
                  • Claudia FrankC
                    Claudia Frank
                    last edited by

                    OK, from my point of view it looks like it is how I’ve described already.
                    If npp find dialog and pythonscript research are using the same flags and regex,
                    the result is the same.

                    So, python script users needs to be aware that npp sets the ignorecase flag
                    per default.

                    If someone finds different behavior - please report.

                    Cheers
                    Claudia

                    1 Reply Last reply Reply Quote 0
                    • Alan KilbornA
                      Alan Kilborn @Claudia Frank
                      last edited by

                      @Claudia-Frank said:

                      Does it matter in regards to your script?

                      Well…what matters in regards to my script is that the regex search works correctly. The discussion about interactive versus scripted search and embedded flags versus separate flags is nice, but…

                      The bottom line is that the scripted search has a problem, and that problem is seemingly related to the use of \< and \> in the regex, which I would like to be able to use. So, okay, we can “fix” that by working around it by playing with the flags stuff, and that is all fine, I guess.

                      But should that become a general rule for using Pythonscript’s re.search() function, based upon someone finding one weird case? Maybe there is no great answer to that at this point, but I’m attempting to create something general purpose, where I can’t test all possible regexes for how they work at coding time, as they are input at run-time.

                      So I’m left wondering if the best thing to do when coding with editor.research() is to never embed (?i) or (?-i) in the regex, and instead always use the flags parameter. Perhaps based upon the data we have at this moment, that is the best course of action.

                      Claudia FrankC 1 Reply Last reply Reply Quote 0
                      • Claudia FrankC
                        Claudia Frank @Alan Kilborn
                        last edited by

                        @Alan-Kilborn

                        Well…what matters in regards to my script is that the regex search works correctly.

                        More philosophic, what means correct if we talk about regex.
                        As long as the same regex can be interpreted differently by different regex engines
                        how can we be sure what is right or wrong.

                        I agree, I assumed too, that using the modifiers overrules the flags but with the regex
                        you used we see it isn’t. I don’t know if this is a bug in the implementation of the boost
                        regex engine or a bug of the boost regex engine. I don’t think that I will go in that detail.

                        Concerning the correctness of the scripts - you found one misbehavior, I’m quite certain that others exist.

                        The bottom line is that the scripted search has a problem, and that problem is seemingly related to the use of < and > in the regex

                        Just to be clear, both, npp and pythonscript behave the same. It is not only python script which does this.

                        which I would like to be able to use. So, okay, we can “fix” that by working around it by playing with the flags stuff, and that is all fine, I guess.

                        From the test it looks like only the “start of word” and “end of word” word boundaries behave strange.
                        If you stop using this particular item you might have already solved it.
                        Until you discover the next issue.

                        But should that become a general rule for using Pythonscript’s re.search() function, based upon someone finding one weird case?
                        Maybe there is no great answer to that at this point, but I’m attempting to create something general purpose,
                        where I can’t test all possible regexes for how they work at coding time, as they are input at run-time.
                        So I’m left wondering if the best thing to do when coding with editor.research() is to never embed (?i) or (?-i) in the regex,
                        and instead always use the flags parameter. Perhaps based upon the data we have at this moment, that is the best course of action.

                        Personally, I’m avoiding now using the modifiers in scripts and using flags when needed.

                        Cheers
                        Claudia

                        1 Reply Last reply Reply Quote 0
                        • MAPJe71M
                          MAPJe71
                          last edited by

                          Wondering if it’s only an issue with editor.research( ... ), as I have been using modifiers with editor.rereplace( ... ) successfully.

                          Claudia FrankC 1 Reply Last reply Reply Quote 0
                          • Claudia FrankC
                            Claudia Frank @MAPJe71
                            last edited by Claudia Frank

                            @MAPJe71

                            Yep - same issue.

                            New doc with the word Print only

                            editor.rereplace('(?i)\<print\>', 'PRINT')
                            

                            fails where

                            editor.rereplace('\<print\>', 'PRINT', 2)
                            

                            works

                            UUhhh - I should avoid magic numbers - 2=re.I

                            Cheers
                            Claudia

                            1 Reply Last reply Reply Quote 0
                            • MAPJe71M
                              MAPJe71
                              last edited by

                              UUhhh - I should avoid magic numbers - 2=re.I

                              Damn right!!! LOL

                              1 Reply Last reply Reply Quote 0
                              • guy038G
                                guy038
                                last edited by guy038

                                Hi, All,

                                I did some tests, using the classical Find dialog, with the original text, below :

                                xyz
                                xYz
                                xyZ
                                xYZ
                                Xyz
                                XYz
                                XyZ
                                XYZ
                                -----
                                1xyz
                                1xYz
                                1xyZ
                                1xYZ
                                1Xyz
                                1XYz
                                1XyZ
                                1XYZ
                                -----
                                xyz9
                                xYz9
                                xyZ9
                                xYZ9
                                Xyz9
                                XYz9
                                XyZ9
                                XYZ9
                                -----
                                1xyz9
                                1xYz9
                                1xyZ9
                                1xYZ9
                                1Xyz9
                                1XYz9
                                1XyZ9
                                1XYZ9
                                

                                Then I tested the different regexes

                                • xYZ

                                • xYZ\>

                                • xYZ\b

                                • \<xYZ

                                • \bxYZ

                                • \<xYZ\>

                                • \bxYZ\b

                                • (^|(?<=\W))xYZ((?=\W)|$)

                                • With the Match case option ON

                                • With the Match case option OFF

                                • Preceded by (?i) and with the Match case option ON

                                • Preceded by (?i) and with the Match case option OFF

                                • Preceded by (?-i) and with the Match case option ON

                                • Preceded by (?-i) and with the Match case option OFF

                                I obtained the six following tables, where :

                                • A correct match is indicated by a * character

                                • An incorrect match is indicated by the E letter ( Error )


                                +-------+------------------------------------------------------------------------------------+
                                |       |            Option "Match case"  ON           and           Regex, below            |
                                | Text  |-----+-------+-------+-------+-------+---------+---------+--------------------------|
                                |       | xYZ | xYZ\> | xYZ\b | \<xYZ | \bxYZ | \<xYZ\> | \bxYZ\b | (^|(?<=\W))xYZ((?=\W)|$) |
                                +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                | xyz   |     |       |       |       |       |         |         |                          |
                                | xYz   |     |       |       |       |       |         |         |                          |
                                | xyZ   |     |       |       |       |       |         |         |                          |
                                | xYZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                | Xyz   |     |       |       |       |       |         |         |                          |
                                | XYz   |     |       |       |       |       |         |         |                          |
                                | XyZ   |     |       |       |       |       |         |         |                          |
                                | XYZ   |     |       |       |       |       |         |         |                          |
                                +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                | 1xyz  |     |       |       |       |       |         |         |                          |
                                | 1xYz  |     |       |       |       |       |         |         |                          |
                                | 1xyZ  |     |       |       |       |       |         |         |                          |
                                | 1xYZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                | 1Xyz  |     |       |       |       |       |         |         |                          |
                                | 1XYz  |     |       |       |       |       |         |         |                          |
                                | 1XyZ  |     |       |       |       |       |         |         |                          |
                                | 1XYZ  |     |       |       |       |       |         |         |                          |
                                +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                | xyz9  |     |       |       |       |       |         |         |                          |
                                | xYz9  |     |       |       |       |       |         |         |                          |
                                | xyZ9  |     |       |       |       |       |         |         |                          |
                                | xYZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                | Xyz9  |     |       |       |       |       |         |         |                          |
                                | XYz9  |     |       |       |       |       |         |         |                          |
                                | XyZ9  |     |       |       |       |       |         |         |                          |
                                | XYZ9  |     |       |       |       |       |         |         |                          |
                                +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                | 1xyz9 |     |       |       |       |       |         |         |                          |
                                | 1xYz9 |     |       |       |       |       |         |         |                          |
                                | 1xyZ9 |     |       |       |       |       |         |         |                          |
                                | 1xYZ9 |  *  |       |       |       |       |         |         |                          |
                                | 1Xyz9 |     |       |       |       |       |         |         |                          |
                                | 1XYz9 |     |       |       |       |       |         |         |                          |
                                | 1XyZ9 |     |       |       |       |       |         |         |                          |
                                | 1XYZ9 |     |       |       |       |       |         |         |                          |
                                +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                
                                
                                +-------+------------------------------------------------------------------------------------+
                                |       |            Option "Match case"  OFF          and           Regex, below            |
                                | Text  |-----+-------+-------+-------+-------+---------+---------+--------------------------|
                                |       | xYZ | xYZ\> | xYZ\b | \<xYZ | \bxYZ | \<xYZ\> | \bxYZ\b | (^|(?<=\W))xYZ((?=\W)|$) |
                                +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                | xyz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                | xYz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                | xyZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                | xYZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                | Xyz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                | XYz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                | XyZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                | XYZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                | 1xyz  |  *  |   *   |   *   |       |       |         |         |                          |
                                | 1xYz  |  *  |   *   |   *   |       |       |         |         |                          |
                                | 1xyZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                | 1xYZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                | 1Xyz  |  *  |   *   |   *   |       |       |         |         |                          |
                                | 1XYz  |  *  |   *   |   *   |       |       |         |         |                          |
                                | 1XyZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                | 1XYZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                | xyz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                | xYz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                | xyZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                | xYZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                | Xyz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                | XYz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                | XyZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                | XYZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                | 1xyz9 |  *  |       |       |       |       |         |         |                          |
                                | 1xYz9 |  *  |       |       |       |       |         |         |                          |
                                | 1xyZ9 |  *  |       |       |       |       |         |         |                          |
                                | 1xYZ9 |  *  |       |       |       |       |         |         |                          |
                                | 1Xyz9 |  *  |       |       |       |       |         |         |                          |
                                | 1XYz9 |  *  |       |       |       |       |         |         |                          |
                                | 1XyZ9 |  *  |       |       |       |       |         |         |                          |
                                | 1XYZ9 |  *  |       |       |       |       |         |         |                          |
                                +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                
                                
                                +-------+------------------------------------------------------------------------------------+
                                |       |       Option "Match case"  ON       and       Regex, below, PRECEDED by (?i)       |
                                | Text  |-----+-------+-------+-------+-------+---------+---------+--------------------------|
                                |       | xYZ | xYZ\> | xYZ\b | \<xYZ | \bxYZ | \<xYZ\> | \bxYZ\b | (^|(?<=\W))xYZ((?=\W)|$) |
                                +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                | xyz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                | xYz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                | xyZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                | xYZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                | Xyz   |  *  |   *   |   *   |   E   |   *   |    E    |    *    |            *             |
                                | XYz   |  *  |   *   |   *   |   E   |   *   |    E    |    *    |            *             |
                                | XyZ   |  *  |   *   |   *   |   E   |   *   |    E    |    *    |            *             |
                                | XYZ   |  *  |   *   |   *   |   E   |   *   |    E    |    *    |            *             |
                                +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                | 1xyz  |  *  |   *   |   *   |       |       |         |         |                          |
                                | 1xYz  |  *  |   *   |   *   |       |       |         |         |                          |
                                | 1xyZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                | 1xYZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                | 1Xyz  |  *  |   *   |   *   |       |       |         |         |                          |
                                | 1XYz  |  *  |   *   |   *   |       |       |         |         |                          |
                                | 1XyZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                | 1XYZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                | xyz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                | xYz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                | xyZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                | xYZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                | Xyz9  |  *  |       |       |   E   |   *   |         |         |                          |
                                | XYz9  |  *  |       |       |   E   |   *   |         |         |                          |
                                | XyZ9  |  *  |       |       |   E   |   *   |         |         |                          |
                                | XYZ9  |  *  |       |       |   E   |   *   |         |         |                          |
                                +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                | 1xyz9 |  *  |       |       |       |       |         |         |                          |
                                | 1xYz9 |  *  |       |       |       |       |         |         |                          |
                                | 1xyZ9 |  *  |       |       |       |       |         |         |                          |
                                | 1xYZ9 |  *  |       |       |       |       |         |         |                          |
                                | 1Xyz9 |  *  |       |       |       |       |         |         |                          |
                                | 1XYz9 |  *  |       |       |       |       |         |         |                          |
                                | 1XyZ9 |  *  |       |       |       |       |         |         |                          |
                                | 1XYZ9 |  *  |       |       |       |       |         |         |                          |
                                +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                
                                
                                
                                1 Reply Last reply Reply Quote 1
                                • guy038G
                                  guy038
                                  last edited by guy038

                                  I need to split this post in two parts because it exceeds 16384 characters !

                                  +-------+------------------------------------------------------------------------------------+
                                  |       |       Option "Match case"  OFF       and       Regex, below, PRECEDED by (?i)      |
                                  | Text  |-----+-------+-------+-------+-------+---------+---------+--------------------------|
                                  |       | xYZ | xYZ\> | xYZ\b | \<xYZ | \bxYZ | \<xYZ\> | \bxYZ\b | (^|(?<=\W))xYZ((?=\W)|$) |
                                  +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                  | xyz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                  | xYz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                  | xyZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                  | xYZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                  | Xyz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                  | XYz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                  | XyZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                  | XYZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                  +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                  | 1xyz  |  *  |   *   |   *   |       |       |         |         |                          |
                                  | 1xYz  |  *  |   *   |   *   |       |       |         |         |                          |
                                  | 1xyZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                  | 1xYZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                  | 1Xyz  |  *  |   *   |   *   |       |       |         |         |                          |
                                  | 1XYz  |  *  |   *   |   *   |       |       |         |         |                          |
                                  | 1XyZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                  | 1XYZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                  +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                  | xyz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                  | xYz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                  | xyZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                  | xYZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                  | Xyz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                  | XYz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                  | XyZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                  | XYZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                  +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                  | 1xyz9 |  *  |       |       |       |       |         |         |                          |
                                  | 1xYz9 |  *  |       |       |       |       |         |         |                          |
                                  | 1xyZ9 |  *  |       |       |       |       |         |         |                          |
                                  | 1xYZ9 |  *  |       |       |       |       |         |         |                          |
                                  | 1Xyz9 |  *  |       |       |       |       |         |         |                          |
                                  | 1XYz9 |  *  |       |       |       |       |         |         |                          |
                                  | 1XyZ9 |  *  |       |       |       |       |         |         |                          |
                                  | 1XYZ9 |  *  |       |       |       |       |         |         |                          |
                                  +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                  
                                  
                                  +-------+------------------------------------------------------------------------------------+
                                  |       |       Option "Match case"  ON       and       Regex, below, PRECEDED by (?-i)      |
                                  | Text  |-----+-------+-------+-------+-------+---------+---------+--------------------------|
                                  |       | xYZ | xYZ\> | xYZ\b | \<xYZ | \bxYZ | \<xYZ\> | \bxYZ\b | (^|(?<=\W))xYZ((?=\W)|$) |
                                  +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                  | xyz   |     |       |       |       |       |         |         |                          |
                                  | xYz   |     |       |       |       |       |         |         |                          |
                                  | xyZ   |     |       |       |       |       |         |         |                          |
                                  | xYZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                  | Xyz   |     |       |       |       |       |         |         |                          |
                                  | XYz   |     |       |       |       |       |         |         |                          |
                                  | XyZ   |     |       |       |       |       |         |         |                          |
                                  | XYZ   |     |       |       |       |       |         |         |                          |
                                  +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                  | 1xyz  |     |       |       |       |       |         |         |                          |
                                  | 1xYz  |     |       |       |       |       |         |         |                          |
                                  | 1xyZ  |     |       |       |       |       |         |         |                          |
                                  | 1xYZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                  | 1Xyz  |     |       |       |       |       |         |         |                          |
                                  | 1XYz  |     |       |       |       |       |         |         |                          |
                                  | 1XyZ  |     |       |       |       |       |         |         |                          |
                                  | 1XYZ  |     |       |       |       |       |         |         |                          |
                                  +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                  | xyz9  |     |       |       |       |       |         |         |                          |
                                  | xYz9  |     |       |       |       |       |         |         |                          |
                                  | xyZ9  |     |       |       |       |       |         |         |                          |
                                  | xYZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                  | Xyz9  |     |       |       |       |       |         |         |                          |
                                  | XYz9  |     |       |       |       |       |         |         |                          |
                                  | XyZ9  |     |       |       |       |       |         |         |                          |
                                  | XYZ9  |     |       |       |       |       |         |         |                          |
                                  +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                  | 1xyz9 |     |       |       |       |       |         |         |                          |
                                  | 1xYz9 |     |       |       |       |       |         |         |                          |
                                  | 1xyZ9 |     |       |       |       |       |         |         |                          |
                                  | 1xYZ9 |  *  |       |       |       |       |         |         |                          |
                                  | 1Xyz9 |     |       |       |       |       |         |         |                          |
                                  | 1XYz9 |     |       |       |       |       |         |         |                          |
                                  | 1XyZ9 |     |       |       |       |       |         |         |                          |
                                  | 1XYZ9 |     |       |       |       |       |         |         |                          |
                                  +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                  
                                  
                                  +-------+------------------------------------------------------------------------------------+
                                  |       |       Option "Match case"  OFF       and       Regex, below, PRECEDED by (?-i)     |
                                  | Text  |-----+-------+-------+-------+-------+---------+---------+--------------------------|
                                  |       | xYZ | xYZ\> | xYZ\b | \<xYZ | \bxYZ | \<xYZ\> | \bxYZ\b | (^|(?<=\W))xYZ((?=\W)|$) |
                                  +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                  | xyz   |     |       |       |       |       |         |         |                          |
                                  | xYz   |     |       |       |       |       |         |         |                          |
                                  | xyZ   |     |       |       |       |       |         |         |                          |
                                  | xYZ   |  *  |   *   |   *   |   *   |       |    *    |    *    |            *             |
                                  | Xyz   |     |       |       |       |       |         |         |                          |
                                  | XYz   |     |       |       |       |       |         |         |                          |
                                  | XyZ   |     |       |       |       |       |         |         |                          |
                                  | XYZ   |     |       |       |       |       |         |         |                          |
                                  +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                  | 1xyz  |     |       |       |       |       |         |         |                          |
                                  | 1xYz  |     |       |       |       |       |         |         |                          |
                                  | 1xyZ  |     |       |       |       |       |         |         |                          |
                                  | 1xYZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                  | 1Xyz  |     |       |       |       |       |         |         |                          |
                                  | 1XYz  |     |       |       |       |       |         |         |                          |
                                  | 1XyZ  |     |       |       |       |       |         |         |                          |
                                  | 1XYZ  |     |       |       |       |       |         |         |                          |
                                  +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                  | xyz9  |     |       |       |       |       |         |         |                          |
                                  | xYz9  |     |       |       |       |       |         |         |                          |
                                  | xyZ9  |     |       |       |       |       |         |         |                          |
                                  | xYZ9  |  *  |       |       |   *   |       |         |         |                          |
                                  | Xyz9  |     |       |       |       |       |         |         |                          |
                                  | XYz9  |     |       |       |       |       |         |         |                          |
                                  | XyZ9  |     |       |       |       |       |         |         |                          |
                                  | XYZ9  |     |       |       |       |       |         |         |                          |
                                  +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                  | 1xyz9 |     |       |       |       |       |         |         |                          |
                                  | 1xYz9 |     |       |       |       |       |         |         |                          |
                                  | 1xyZ9 |     |       |       |       |       |         |         |                          |
                                  | 1xYZ9 |  *  |       |       |       |       |         |         |                          |
                                  | 1Xyz9 |     |       |       |       |       |         |         |                          |
                                  | 1XYz9 |     |       |       |       |       |         |         |                          |
                                  | 1XyZ9 |     |       |       |       |       |         |         |                          |
                                  | 1XYZ9 |     |       |       |       |       |         |         |                          |
                                  +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                  

                                  From these results, we can deduce than the N++ Boost regex engine lacks to match 4 ( or 8 ) cases, ONLY IF the four conditions, below, occur, simultaneously :

                                  • The Match case option, of the Find dialog, is ON

                                  • A (?i) modifier starts the regex

                                  • The regex begins with the \< assertion

                                  • The text, to match, begins with an UPPER letter


                                  Luckily :

                                  • The regex \<xYZ can be changed by the regex \bxYZ

                                  • The regex \<xYZ\> can be changed by the regex \bxYZ\b

                                  And note that :

                                  • The assertion \< may be replaced by the assertion (^|(?<=\W))

                                  • The assertion \> may be replaced by the assertion ((?=\W)|$)

                                  Cheers

                                  guy038

                                  BTW, Claudia, when you said :

                                  I agree, I assumed too, that using the modifiers overrules the flags but with the regex
                                  you used we see it isn’t.

                                  I disagree ! Indeed :

                                  Considering the text below :

                                  xyz
                                  xYz
                                  xyZ
                                  xYZ
                                  Xyz
                                  XYz
                                  XyZ
                                  XYZ
                                  

                                  Of course, the two regexes (?i)\<xYZ and (?i)\<xYZ\>, with the Match case option ON, match the first four cases, only

                                  But, the two regexes (?i)\<XYZ and (?i)\<XYZ\>, with the Match case option ON, ALSO match the first four cases, only !

                                  So, I do think that the in-line modifiers (?i) and (?-i) have ALWAYS priority over the Match case option

                                  And, like you, I rather think that it’s just a bug [ in the implementation ] of the Boost regex engine !

                                  Besides, the two regexes (?i)\bxYZ and (?i)\bxYZ\b, with the Match case option ON, correctly match the eight cases, above, of the string “xyz” :-)) => The (?i) modifier forces the insensitive search !

                                  1 Reply Last reply Reply Quote 1
                                  • Claudia FrankC
                                    Claudia Frank
                                    last edited by

                                    Hi Guy,

                                    thx for your effort on this but I have to disagree with your disagree ;-D

                                    From regex execution point of view there are two ways to change
                                    the case behavior. Either by providing a flag or using the in-line modifiers.
                                    When providing the flag everything is ok (at least for the moment) - so
                                    I have to assume that the regex engine works correctly in this case.
                                    When providing the in-line modifier and the flags then it isn’t ok always.
                                    This does mean there is a bug and it must be related to how in-line modifiers
                                    are handled against flags. And that makes me think that the bug is when doing
                                    this overwrite - which means we can’t rely on it. Maybe other in-line modifiers
                                    together with some special regex constructs behave wrong as well.

                                    Cheers
                                    Claudia

                                    1 Reply Last reply Reply Quote 0
                                    • guy038G
                                      guy038
                                      last edited by guy038

                                      Hello, Claudia and All,

                                      Hum…,finally, Claudia, I think that you’re right :-) Indeed, if we built the general table, below, which recapitulates the main cases, it’s obvious that :

                                      • Results are OK, when the “Match case” flag, is ONLY used, WITHOUT any in-line modifier ( Lines 1 and 4 )

                                      • Results seem OK, ( UP TO NOW ), when the “Match case” flag is used, with a starting (?-i) in-line modifier ( Lines 3 et 6 )

                                      • Results seem OK, ( UP TO NOW ), when the “Match case” flag is OFF, with a starting (?i) in-line modifier ( Line 2 )

                                      • Results are NOT OK, when the “Match case” flag is ON, with a starting (?i) in-line modifier and a regex which begins with the \< assertion ( Line 5 )

                                      Luckily, this LAST case ( Line 5 ) is rather rare and does not occur if we use the \b syntax, instead of \< :-))

                                      +=======+=======================+====================+===========+==================+
                                      |  Row  |   "Match case" flag   |  In-line modifier  |  Results  |     Remarks      |
                                      +=======+=======================+====================+===========+==================+
                                      |   1   |          OFF          |         NO         |  Correct  |  Implicit (?i)   |
                                      +-------+-----------------------+--------------------+-----------+------------------+
                                      |   2   |          OFF          |        (?i)        |  Correct  |                  |
                                      +-------+-----------------------+--------------------+-----------+------------------+
                                      |   3   |          OFF          |        (?-i)       |  Correct  |                  |
                                      +=======+=======================+====================+===========+==================+
                                      |   4   |          ON           |         NO         |  Correct  |  Implicit (?-i)  |
                                      +-------+-----------------------+--------------------+-----------+------------------+
                                      |   5   |          ON           |        (?i)        |  PROBLEM  |  IF use of \<    |
                                      +-------+-----------------------+--------------------+-----------+------------------+
                                      |   6   |          ON           |        (?-i)       |  Correct  |                  |
                                      +=======+=======================+====================+===========+==================+
                                      

                                      Cheers,

                                      guy038

                                      1 Reply Last reply Reply Quote 0
                                      • Alan KilbornA
                                        Alan Kilborn
                                        last edited by

                                        First of all, it is great to see such rousing discussion about the issue I discovered! :-) Thanks to all for that.

                                        There are lots of things to think about coming out of this discussion, but the most obvious and immediate one is a question for Mr Guy: You keep suggesting to use \b instead of \< , but they are not always equivalent, correct? They may be equivalent for certain examples, but in the most general case I believe they are different. If they weren’t different, there would be no reason for both to exist in the N++/Boost engine…

                                        I mean, even I discussed using \b instead in my very first posting in this thread, but that was just as a test, not necessarily a blanket substitution. I guess I don’t want others reading this thread to takeaway that \b and \< are the exact same thing.

                                        Comments? Thoughts?

                                        1 Reply Last reply Reply Quote 0
                                        • MAPJe71M
                                          MAPJe71
                                          last edited by

                                          See reference on Word Boundaries for

                                          • description on differences between \b, \< and \>;
                                          • which “engine” supports what.
                                          1 Reply Last reply Reply Quote 0
                                          • guy038G
                                            guy038
                                            last edited by guy038

                                            Hi Alan and MapJe71,

                                            Thanks, MapJe71, for the link about Word Boundaries, from the definitive site about regular expressions ! Of course, Alan, I know the differences between the three assertions : \b , \< and \>. I just preferred not to speak about it, first, in order to keep concentrated on your problem !

                                            To be short, the \b assertion acts, either, as a \< assertion OR as a \> assertion. This explains that the regex \<WORD\> can be simply replaced by the regex \bWORD\b.

                                            BTW, in the Words Boundaries table, I noticed the POSIX word boundaries ( [[:<:]] and [[:>:]] ) which have, exactly, the same meaning as the GNU word boundaries \< and >\ ). These syntaxes are functional, with the N++ Boost regex engine ! Unfortunately, Alan, the problem that you noticed does occur with the POSIX word boundaries, too :-((.


                                            On top of that, from the LAST row of the “Word Boundaries” table, named Word Boundaries behaviour, it is said that “word boundaries” are not correctly handled, in most regex engines :

                                            Word boundaries always match at the start of the match attempt if that position is followed by a word character, regardless of the character that precedes the start of the match attempt. (Thus, word boundaries are not handled correctly for the second and following match attempts in the same string.)

                                            And it shows an example :

                                            \b. matches all of the letters but not the space when iterating over all matches, in the string “abc def”


                                            So, I did some tests ( again !! )

                                            • I copied this single sentence, below, part of the license.txt file, in a new tab
                                            By contrast, the GNU General Public License is intended to guarantee your freedom...
                                            
                                            • In the Find dialog, I left the Match case and the . matches newline options UNCHECKED

                                            • I selected, of course, the Regular expression search mode

                                            • I tested the different regexes, below, against the example text

                                            REMARK : In the table, below, each dash character, under the sentence, indicates a match of the corresponding regex(es) !

                                            ========================================================================================================================
                                            |     REGEXES     |                EXAMPLE text    -     MATCHES noted by a DASH character               |   RESULTS   |
                                            ========================================================================================================================
                                            |                 |                                                                                      |             |
                                            |                 | By contrast, the GNU General Public License is intended to guarantee your freedom... | INCORRECT ! |
                                            |  (^|(?<!\w)).   | ------------------------------------------------------------------------------------ |             |
                                            |                 |                                                                                      |             |
                                            +-----------------+--------------------------------------------------------------------------------------+-------------+
                                            |                 |                                                                                      |             |
                                            |  \b.            |                                                                                      |             |
                                            |  \<.            |                                                                                      |             |
                                            |  [[:<:]].       |                                                                                      |             |
                                            |                 |                                                                                      |             |
                                            |                 | By contrast, the GNU General Public License is intended to guarantee your freedom... | INCORRECT ! |
                                            |  \b\w           | -- --------  --- --- ------- ------ ------- -- -------- -- --------- ---- -------    |             |
                                            |  \<\w           |                                                                                      |             |
                                            |  [[:<:]]\w      |                                                                                      |             |
                                            |  (^|(?<!\w))\w  |                                                                                      |             |
                                            |                 |                                                                                      |             |
                                            +-----------------+--------------------------------------------------------------------------------------+-------------+
                                            |                 |                                                                                      |             |
                                            |                 | By contrast, the GNU General Public License is intended to guarantee your freedom... |  INCORRECT  |
                                            |  (^|(?<=\W)).   | -  -        -    -   -       -      -       -  -        -  -         -    -       -  |             |
                                            |                 |                                                                                      |             |
                                            +-----------------+--------------------------------------------------------------------------------------+-------------+
                                            |                 |                                                                                      | (At last !) |
                                            |                 | By contrast, the GNU General Public License is intended to guarantee your freedom... |             |
                                            |  (^|(?<=\W))\w  | -  -         -   -   -       -      -       -  -        -  -         -    -          |   CORRECT   |
                                            |                 |                                                                                      |             |
                                            ==================+======================================================================================+==============
                                            |                 |                                                                                      |             |
                                            |                 | By contrast, the GNU General Public License is intended to guarantee your freedom... | INCORRECT ! |
                                            |  .\b            |  --       - -  --  --      --     --      -- --       -- --        --   --      -    |             |
                                            |                 |                                                                                      |             |
                                            +-----------------+--------------------------------------------------------------------------------------+-------------+
                                            |                 |                                                                                      |             |
                                            |                 |                                                                                      |             |
                                            |  .((?=\W)|$)    | By contrast, the GNU General Public License is intended to guarantee your freedom... | INCORRECT ! |
                                            |  .((?!\w)|$)    |  -        --   -   -       -      -       -  -        -  -         -    -       ---- |             |
                                            |                 |                                                                                      |             |
                                            |                 |                                                                                      |             |
                                            +-----------------+--------------------------------------------------------------------------------------+-------------+
                                            |                 |                                                                                      |             |
                                            |  .\>            |                                                                                      |             |
                                            |  .[[:>:]]       |                                                                                      |             |
                                            |                 |                                                                                      |             |
                                            |  \w\b           | By contrast, the GNU General Public License is intended to guarantee your freedom... |   CORRECT   |
                                            |  \w\>           |  -        -    -   -       -      -       -  -        -  -         -    -       -    |             |
                                            |  \w[[:>:]]      |                                                                                      |             |
                                            |  \w((?=\W)|$)   |                                                                                      |             |
                                            |  \w((?!\w)|$)   |                                                                                      |             |
                                            |                 |                                                                                      |             |
                                            ========================================================================================================================
                                            

                                            From that table, it obvious that the handle of the assertions, by the N++ Boost engine, seems quite weird !!!

                                            To be coherent, only two regexes, with similar syntax, should be used :

                                            • The regex (^|(?<=\W))\w, which matches the FIRST character of a word

                                            • The regex \w((?=\W)|$), which matches the LAST character of a word

                                            => The regex (^|(?<=\W))\w|\w((?=\W)|$) matches the first AND the last characters of a word

                                            Best Regards,

                                            guy038

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors