Community
    • 登入

    Pythonscript search different than N++ search when using \< and leading uppercase

    已排程 已置頂 已鎖定 已移動 Help wanted · · · – – – · · ·
    25 貼文 5 Posters 13.9k 瀏覽
    正在載入更多貼文
    • 從舊到新
    • 從新到舊
    • 最多點贊
    回覆
    • 在新貼文中回覆
    登入後回覆
    此主題已被刪除。只有擁有主題管理權限的使用者可以查看。
    • Alan KilbornA
      Alan Kilborn
      最後由 編輯

      When I run this Pythonscript code:

      '''
      print         01
      Print         02
      PrinT         03
      prinT         04
       print        05
       Print        06
       Print        07
       prinT        08
      '''
      
      matches = []
      
      editor.research(r'(?i)\<print\>\s*\d+', lambda m: matches.append(m.span()))
      
      for (s,e) in matches: print editor.getTextRange(s, e)
      

      It only matches the “print” lines with these numbers: 01, 04, 05, 08. The unmatched lines all contain uppercase “P” in the print.

      However, if I do an interactive Notepad++ search using the same regular expression, it matches all eight print lines. It also matches all eight lines if I try it in RegexBuddy.

      Any ideas on why this is different? I always thought that Pythonscript’s re.search() used the same engine as Notepad++'s interactive Find, but these results seem to indicate something different.

      It seems to be related to the \< and \> word boundary specifiers. If I switch to \b then it works as expected.

      Claudia FrankC 2 條回覆 最後回覆 回覆 引用 0
      • Claudia FrankC
        Claudia Frank @Alan Kilborn
        最後由 編輯

        @Alan-Kilborn

        it is using the same engine but, as you already said, it seems that there is a difference
        in the implementation, unfortunately.

        Can’t really say what - one needs to diff both implementations.

        Cheers
        Claudia

        1 條回覆 最後回覆 回覆 引用 0
        • Claudia FrankC
          Claudia Frank @Alan Kilborn
          最後由 編輯

          @Alan-Kilborn

          Actually it looks like the issue is with the (?i) because if
          setting ignore case explicitly by using

          editor.research(b'(?i)\<print\>\s*\d+', lambda m: matches.append(m.span()),2)
          

          than you do get the same results.
          Note 2 = re.I

          Cheers
          Claudia

          Alan KilbornA 1 條回覆 最後回覆 回覆 引用 0
          • Alan KilbornA
            Alan Kilborn @Claudia Frank
            最後由 編輯

            @Claudia-Frank

            Thanks for your input, Claudia. I’m not sure if your findings make me feel better about it, or worse! :(

            Claudia FrankC 1 條回覆 最後回覆 回覆 引用 0
            • Claudia FrankC
              Claudia Frank @Alan Kilborn
              最後由 編輯

              @Alan-Kilborn

              I’m in the same mood.

              I will try to see if I understand what npp is really doing under the hood.
              Maybe it is just replacing the (?i) with flags then the
              obvious solution would be to parse a regex for the flags and
              setting it explicitly, if using pythonscript, to get the same results.

              I will follow-up on this.

              Cheers
              Claudia

              1 條回覆 最後回覆 回覆 引用 0
              • MAPJe71M
                MAPJe71
                最後由 MAPJe71 編輯

                Python 2.7.x does not interpret \< and \> as word boundaries.
                Verified it with RegexBuddy by changing the Application (Regex Flavor) from boost::regex 1.58-1.59 to python 2.7.

                Claudia FrankC 1 條回覆 最後回覆 回覆 引用 0
                • Claudia FrankC
                  Claudia Frank @MAPJe71
                  最後由 編輯

                  @MAPJe71

                  yes, but research is not using python engine it is using boost engine, afaik.

                  Cheers
                  Claudia

                  1 條回覆 最後回覆 回覆 引用 0
                  • Claudia FrankC
                    Claudia Frank
                    最後由 編輯

                    quick update
                    I assume I found the difference and some glitch (depends on view).

                    If searching by npp find/reaplce dialog the standard is ignorecase.
                    Flag matchcase is only set if checkbox is checked (makes sense)

                    If searching via pythonscript and research than standard is matchcase.
                    Only if flag re.I is provided it does ignoring the case (again, makes sense)

                    Our assumption was that regardless what has been selected, the usage of
                    (?i) does mean that it will treat it as ignorecase and this is not the case.

                    If you do a search via the npp dialog using matchcase checkbox and (?i)
                    you will see that it uses the matchcase instead of ignorecase.

                    So - at the moment I would say, both behave the same.

                    I will do some further tests later.

                    Cheers
                    Claudia

                    Scott SumnerS 1 條回覆 最後回覆 回覆 引用 0
                    • Scott SumnerS
                      Scott Sumner @Claudia Frank
                      最後由 Scott Sumner 編輯

                      @Claudia-Frank said:

                      If you do a search via the npp dialog using matchcase checkbox and (?i)
                      you will see that it uses the matchcase instead of ignorecase.

                      Okay, so now I’m confused. If I try a very simple interactive search (forget the \< thing discussed earlier), where I lead off the Find-what box with (?i), it makes no difference whether the “match case” checkbox is checked or not, my simple search gets matches that ignore case. @Claudia-Frank , doesn’t this result go against what you just said?

                      Claudia FrankC 1 條回覆 最後回覆 回覆 引用 0
                      • Claudia FrankC
                        Claudia Frank @Scott Sumner
                        最後由 編輯

                        @Scott-Sumner

                        No, if you take into account that research is resulting the same but
                        yes if you took that statement in general.
                        I’m comparing npp dialog search and python script research results.

                        But you are right, confusing!

                        From testing it seems that research and npp search behave the same
                        if flags are set the same.

                        Cheers
                        Claudia

                        Alan KilbornA 1 條回覆 最後回覆 回覆 引用 0
                        • Alan KilbornA
                          Alan Kilborn @Claudia Frank
                          最後由 編輯

                          @Claudia-Frank , do you still have more testing to do, or have you taken it as far as it can go? If you are done, then I have questions. :)

                          The main question is, should the golden rule be, when writing Pythonscript, to forget about embedding (?i) at the front of a regular expession string, and setting the flags parameter to re.IGNORECASE (or re.I, or even 2) to get the behavior that is desired?

                          I always liked using (?i) because that way, when reading a regex from left to right, I could prepare myself properly, rather than having to look for a somewhat disconnected flags parameter. But, if things aren’t going to work right, I can learn another way.

                          However, I’m still confused about what I should do to make things work right in all cases… :(

                          Claudia FrankC 1 條回覆 最後回覆 回覆 引用 0
                          • Claudia FrankC
                            Claudia Frank @Alan Kilborn
                            最後由 編輯

                            @Alan-Kilborn ,
                            at the moment, it looks like my last statement about the flags and if set the same seems to be correct.
                            But this could mean I just haven’t found the regex which breaks it again or it is simply true.
                            I like to tend to the latter :-)

                            Does it matter in regards to your script?
                            I assume, only if you want to compare with the interactive search dialog or
                            when creating the regex with the dialog and afterwards using it in a script.

                            If my statement about the flags is true, and you want to have the script behave exactly the same
                            as npps find dialog does, then you need to provide the flags as npp does.

                            I do see the advantage of using the modifiers but it looks like that neither pythonscript
                            nor npp do it right in terms of the issue you found.

                            Cheers
                            Claudia

                            Alan KilbornA 1 條回覆 最後回覆 回覆 引用 0
                            • Claudia FrankC
                              Claudia Frank
                              最後由 編輯

                              OK, from my point of view it looks like it is how I’ve described already.
                              If npp find dialog and pythonscript research are using the same flags and regex,
                              the result is the same.

                              So, python script users needs to be aware that npp sets the ignorecase flag
                              per default.

                              If someone finds different behavior - please report.

                              Cheers
                              Claudia

                              1 條回覆 最後回覆 回覆 引用 0
                              • Alan KilbornA
                                Alan Kilborn @Claudia Frank
                                最後由 編輯

                                @Claudia-Frank said:

                                Does it matter in regards to your script?

                                Well…what matters in regards to my script is that the regex search works correctly. The discussion about interactive versus scripted search and embedded flags versus separate flags is nice, but…

                                The bottom line is that the scripted search has a problem, and that problem is seemingly related to the use of \< and \> in the regex, which I would like to be able to use. So, okay, we can “fix” that by working around it by playing with the flags stuff, and that is all fine, I guess.

                                But should that become a general rule for using Pythonscript’s re.search() function, based upon someone finding one weird case? Maybe there is no great answer to that at this point, but I’m attempting to create something general purpose, where I can’t test all possible regexes for how they work at coding time, as they are input at run-time.

                                So I’m left wondering if the best thing to do when coding with editor.research() is to never embed (?i) or (?-i) in the regex, and instead always use the flags parameter. Perhaps based upon the data we have at this moment, that is the best course of action.

                                Claudia FrankC 1 條回覆 最後回覆 回覆 引用 0
                                • Claudia FrankC
                                  Claudia Frank @Alan Kilborn
                                  最後由 編輯

                                  @Alan-Kilborn

                                  Well…what matters in regards to my script is that the regex search works correctly.

                                  More philosophic, what means correct if we talk about regex.
                                  As long as the same regex can be interpreted differently by different regex engines
                                  how can we be sure what is right or wrong.

                                  I agree, I assumed too, that using the modifiers overrules the flags but with the regex
                                  you used we see it isn’t. I don’t know if this is a bug in the implementation of the boost
                                  regex engine or a bug of the boost regex engine. I don’t think that I will go in that detail.

                                  Concerning the correctness of the scripts - you found one misbehavior, I’m quite certain that others exist.

                                  The bottom line is that the scripted search has a problem, and that problem is seemingly related to the use of < and > in the regex

                                  Just to be clear, both, npp and pythonscript behave the same. It is not only python script which does this.

                                  which I would like to be able to use. So, okay, we can “fix” that by working around it by playing with the flags stuff, and that is all fine, I guess.

                                  From the test it looks like only the “start of word” and “end of word” word boundaries behave strange.
                                  If you stop using this particular item you might have already solved it.
                                  Until you discover the next issue.

                                  But should that become a general rule for using Pythonscript’s re.search() function, based upon someone finding one weird case?
                                  Maybe there is no great answer to that at this point, but I’m attempting to create something general purpose,
                                  where I can’t test all possible regexes for how they work at coding time, as they are input at run-time.
                                  So I’m left wondering if the best thing to do when coding with editor.research() is to never embed (?i) or (?-i) in the regex,
                                  and instead always use the flags parameter. Perhaps based upon the data we have at this moment, that is the best course of action.

                                  Personally, I’m avoiding now using the modifiers in scripts and using flags when needed.

                                  Cheers
                                  Claudia

                                  1 條回覆 最後回覆 回覆 引用 0
                                  • MAPJe71M
                                    MAPJe71
                                    最後由 編輯

                                    Wondering if it’s only an issue with editor.research( ... ), as I have been using modifiers with editor.rereplace( ... ) successfully.

                                    Claudia FrankC 1 條回覆 最後回覆 回覆 引用 0
                                    • Claudia FrankC
                                      Claudia Frank @MAPJe71
                                      最後由 Claudia Frank 編輯

                                      @MAPJe71

                                      Yep - same issue.

                                      New doc with the word Print only

                                      editor.rereplace('(?i)\<print\>', 'PRINT')
                                      

                                      fails where

                                      editor.rereplace('\<print\>', 'PRINT', 2)
                                      

                                      works

                                      UUhhh - I should avoid magic numbers - 2=re.I

                                      Cheers
                                      Claudia

                                      1 條回覆 最後回覆 回覆 引用 0
                                      • MAPJe71M
                                        MAPJe71
                                        最後由 編輯

                                        UUhhh - I should avoid magic numbers - 2=re.I

                                        Damn right!!! LOL

                                        1 條回覆 最後回覆 回覆 引用 0
                                        • guy038G
                                          guy038
                                          最後由 guy038 編輯

                                          Hi, All,

                                          I did some tests, using the classical Find dialog, with the original text, below :

                                          xyz
                                          xYz
                                          xyZ
                                          xYZ
                                          Xyz
                                          XYz
                                          XyZ
                                          XYZ
                                          -----
                                          1xyz
                                          1xYz
                                          1xyZ
                                          1xYZ
                                          1Xyz
                                          1XYz
                                          1XyZ
                                          1XYZ
                                          -----
                                          xyz9
                                          xYz9
                                          xyZ9
                                          xYZ9
                                          Xyz9
                                          XYz9
                                          XyZ9
                                          XYZ9
                                          -----
                                          1xyz9
                                          1xYz9
                                          1xyZ9
                                          1xYZ9
                                          1Xyz9
                                          1XYz9
                                          1XyZ9
                                          1XYZ9
                                          

                                          Then I tested the different regexes

                                          • xYZ

                                          • xYZ\>

                                          • xYZ\b

                                          • \<xYZ

                                          • \bxYZ

                                          • \<xYZ\>

                                          • \bxYZ\b

                                          • (^|(?<=\W))xYZ((?=\W)|$)

                                          • With the Match case option ON

                                          • With the Match case option OFF

                                          • Preceded by (?i) and with the Match case option ON

                                          • Preceded by (?i) and with the Match case option OFF

                                          • Preceded by (?-i) and with the Match case option ON

                                          • Preceded by (?-i) and with the Match case option OFF

                                          I obtained the six following tables, where :

                                          • A correct match is indicated by a * character

                                          • An incorrect match is indicated by the E letter ( Error )


                                          +-------+------------------------------------------------------------------------------------+
                                          |       |            Option "Match case"  ON           and           Regex, below            |
                                          | Text  |-----+-------+-------+-------+-------+---------+---------+--------------------------|
                                          |       | xYZ | xYZ\> | xYZ\b | \<xYZ | \bxYZ | \<xYZ\> | \bxYZ\b | (^|(?<=\W))xYZ((?=\W)|$) |
                                          +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                          | xyz   |     |       |       |       |       |         |         |                          |
                                          | xYz   |     |       |       |       |       |         |         |                          |
                                          | xyZ   |     |       |       |       |       |         |         |                          |
                                          | xYZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                          | Xyz   |     |       |       |       |       |         |         |                          |
                                          | XYz   |     |       |       |       |       |         |         |                          |
                                          | XyZ   |     |       |       |       |       |         |         |                          |
                                          | XYZ   |     |       |       |       |       |         |         |                          |
                                          +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                          | 1xyz  |     |       |       |       |       |         |         |                          |
                                          | 1xYz  |     |       |       |       |       |         |         |                          |
                                          | 1xyZ  |     |       |       |       |       |         |         |                          |
                                          | 1xYZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                          | 1Xyz  |     |       |       |       |       |         |         |                          |
                                          | 1XYz  |     |       |       |       |       |         |         |                          |
                                          | 1XyZ  |     |       |       |       |       |         |         |                          |
                                          | 1XYZ  |     |       |       |       |       |         |         |                          |
                                          +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                          | xyz9  |     |       |       |       |       |         |         |                          |
                                          | xYz9  |     |       |       |       |       |         |         |                          |
                                          | xyZ9  |     |       |       |       |       |         |         |                          |
                                          | xYZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                          | Xyz9  |     |       |       |       |       |         |         |                          |
                                          | XYz9  |     |       |       |       |       |         |         |                          |
                                          | XyZ9  |     |       |       |       |       |         |         |                          |
                                          | XYZ9  |     |       |       |       |       |         |         |                          |
                                          +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                          | 1xyz9 |     |       |       |       |       |         |         |                          |
                                          | 1xYz9 |     |       |       |       |       |         |         |                          |
                                          | 1xyZ9 |     |       |       |       |       |         |         |                          |
                                          | 1xYZ9 |  *  |       |       |       |       |         |         |                          |
                                          | 1Xyz9 |     |       |       |       |       |         |         |                          |
                                          | 1XYz9 |     |       |       |       |       |         |         |                          |
                                          | 1XyZ9 |     |       |       |       |       |         |         |                          |
                                          | 1XYZ9 |     |       |       |       |       |         |         |                          |
                                          +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                          
                                          
                                          +-------+------------------------------------------------------------------------------------+
                                          |       |            Option "Match case"  OFF          and           Regex, below            |
                                          | Text  |-----+-------+-------+-------+-------+---------+---------+--------------------------|
                                          |       | xYZ | xYZ\> | xYZ\b | \<xYZ | \bxYZ | \<xYZ\> | \bxYZ\b | (^|(?<=\W))xYZ((?=\W)|$) |
                                          +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                          | xyz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                          | xYz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                          | xyZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                          | xYZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                          | Xyz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                          | XYz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                          | XyZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                          | XYZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                          +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                          | 1xyz  |  *  |   *   |   *   |       |       |         |         |                          |
                                          | 1xYz  |  *  |   *   |   *   |       |       |         |         |                          |
                                          | 1xyZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                          | 1xYZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                          | 1Xyz  |  *  |   *   |   *   |       |       |         |         |                          |
                                          | 1XYz  |  *  |   *   |   *   |       |       |         |         |                          |
                                          | 1XyZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                          | 1XYZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                          +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                          | xyz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                          | xYz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                          | xyZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                          | xYZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                          | Xyz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                          | XYz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                          | XyZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                          | XYZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                          +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                          | 1xyz9 |  *  |       |       |       |       |         |         |                          |
                                          | 1xYz9 |  *  |       |       |       |       |         |         |                          |
                                          | 1xyZ9 |  *  |       |       |       |       |         |         |                          |
                                          | 1xYZ9 |  *  |       |       |       |       |         |         |                          |
                                          | 1Xyz9 |  *  |       |       |       |       |         |         |                          |
                                          | 1XYz9 |  *  |       |       |       |       |         |         |                          |
                                          | 1XyZ9 |  *  |       |       |       |       |         |         |                          |
                                          | 1XYZ9 |  *  |       |       |       |       |         |         |                          |
                                          +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                          
                                          
                                          +-------+------------------------------------------------------------------------------------+
                                          |       |       Option "Match case"  ON       and       Regex, below, PRECEDED by (?i)       |
                                          | Text  |-----+-------+-------+-------+-------+---------+---------+--------------------------|
                                          |       | xYZ | xYZ\> | xYZ\b | \<xYZ | \bxYZ | \<xYZ\> | \bxYZ\b | (^|(?<=\W))xYZ((?=\W)|$) |
                                          +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                          | xyz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                          | xYz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                          | xyZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                          | xYZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                          | Xyz   |  *  |   *   |   *   |   E   |   *   |    E    |    *    |            *             |
                                          | XYz   |  *  |   *   |   *   |   E   |   *   |    E    |    *    |            *             |
                                          | XyZ   |  *  |   *   |   *   |   E   |   *   |    E    |    *    |            *             |
                                          | XYZ   |  *  |   *   |   *   |   E   |   *   |    E    |    *    |            *             |
                                          +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                          | 1xyz  |  *  |   *   |   *   |       |       |         |         |                          |
                                          | 1xYz  |  *  |   *   |   *   |       |       |         |         |                          |
                                          | 1xyZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                          | 1xYZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                          | 1Xyz  |  *  |   *   |   *   |       |       |         |         |                          |
                                          | 1XYz  |  *  |   *   |   *   |       |       |         |         |                          |
                                          | 1XyZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                          | 1XYZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                          +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                          | xyz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                          | xYz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                          | xyZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                          | xYZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                          | Xyz9  |  *  |       |       |   E   |   *   |         |         |                          |
                                          | XYz9  |  *  |       |       |   E   |   *   |         |         |                          |
                                          | XyZ9  |  *  |       |       |   E   |   *   |         |         |                          |
                                          | XYZ9  |  *  |       |       |   E   |   *   |         |         |                          |
                                          +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                          | 1xyz9 |  *  |       |       |       |       |         |         |                          |
                                          | 1xYz9 |  *  |       |       |       |       |         |         |                          |
                                          | 1xyZ9 |  *  |       |       |       |       |         |         |                          |
                                          | 1xYZ9 |  *  |       |       |       |       |         |         |                          |
                                          | 1Xyz9 |  *  |       |       |       |       |         |         |                          |
                                          | 1XYz9 |  *  |       |       |       |       |         |         |                          |
                                          | 1XyZ9 |  *  |       |       |       |       |         |         |                          |
                                          | 1XYZ9 |  *  |       |       |       |       |         |         |                          |
                                          +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                          
                                          
                                          
                                          1 條回覆 最後回覆 回覆 引用 1
                                          • guy038G
                                            guy038
                                            最後由 guy038 編輯

                                            I need to split this post in two parts because it exceeds 16384 characters !

                                            +-------+------------------------------------------------------------------------------------+
                                            |       |       Option "Match case"  OFF       and       Regex, below, PRECEDED by (?i)      |
                                            | Text  |-----+-------+-------+-------+-------+---------+---------+--------------------------|
                                            |       | xYZ | xYZ\> | xYZ\b | \<xYZ | \bxYZ | \<xYZ\> | \bxYZ\b | (^|(?<=\W))xYZ((?=\W)|$) |
                                            +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                            | xyz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                            | xYz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                            | xyZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                            | xYZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                            | Xyz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                            | XYz   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                            | XyZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                            | XYZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                            +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                            | 1xyz  |  *  |   *   |   *   |       |       |         |         |                          |
                                            | 1xYz  |  *  |   *   |   *   |       |       |         |         |                          |
                                            | 1xyZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                            | 1xYZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                            | 1Xyz  |  *  |   *   |   *   |       |       |         |         |                          |
                                            | 1XYz  |  *  |   *   |   *   |       |       |         |         |                          |
                                            | 1XyZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                            | 1XYZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                            +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                            | xyz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                            | xYz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                            | xyZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                            | xYZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                            | Xyz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                            | XYz9  |  *  |       |       |   *   |   *   |         |         |                          |
                                            | XyZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                            | XYZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                            +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                            | 1xyz9 |  *  |       |       |       |       |         |         |                          |
                                            | 1xYz9 |  *  |       |       |       |       |         |         |                          |
                                            | 1xyZ9 |  *  |       |       |       |       |         |         |                          |
                                            | 1xYZ9 |  *  |       |       |       |       |         |         |                          |
                                            | 1Xyz9 |  *  |       |       |       |       |         |         |                          |
                                            | 1XYz9 |  *  |       |       |       |       |         |         |                          |
                                            | 1XyZ9 |  *  |       |       |       |       |         |         |                          |
                                            | 1XYZ9 |  *  |       |       |       |       |         |         |                          |
                                            +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                            
                                            
                                            +-------+------------------------------------------------------------------------------------+
                                            |       |       Option "Match case"  ON       and       Regex, below, PRECEDED by (?-i)      |
                                            | Text  |-----+-------+-------+-------+-------+---------+---------+--------------------------|
                                            |       | xYZ | xYZ\> | xYZ\b | \<xYZ | \bxYZ | \<xYZ\> | \bxYZ\b | (^|(?<=\W))xYZ((?=\W)|$) |
                                            +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                            | xyz   |     |       |       |       |       |         |         |                          |
                                            | xYz   |     |       |       |       |       |         |         |                          |
                                            | xyZ   |     |       |       |       |       |         |         |                          |
                                            | xYZ   |  *  |   *   |   *   |   *   |   *   |    *    |    *    |            *             |
                                            | Xyz   |     |       |       |       |       |         |         |                          |
                                            | XYz   |     |       |       |       |       |         |         |                          |
                                            | XyZ   |     |       |       |       |       |         |         |                          |
                                            | XYZ   |     |       |       |       |       |         |         |                          |
                                            +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                            | 1xyz  |     |       |       |       |       |         |         |                          |
                                            | 1xYz  |     |       |       |       |       |         |         |                          |
                                            | 1xyZ  |     |       |       |       |       |         |         |                          |
                                            | 1xYZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                            | 1Xyz  |     |       |       |       |       |         |         |                          |
                                            | 1XYz  |     |       |       |       |       |         |         |                          |
                                            | 1XyZ  |     |       |       |       |       |         |         |                          |
                                            | 1XYZ  |     |       |       |       |       |         |         |                          |
                                            +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                            | xyz9  |     |       |       |       |       |         |         |                          |
                                            | xYz9  |     |       |       |       |       |         |         |                          |
                                            | xyZ9  |     |       |       |       |       |         |         |                          |
                                            | xYZ9  |  *  |       |       |   *   |   *   |         |         |                          |
                                            | Xyz9  |     |       |       |       |       |         |         |                          |
                                            | XYz9  |     |       |       |       |       |         |         |                          |
                                            | XyZ9  |     |       |       |       |       |         |         |                          |
                                            | XYZ9  |     |       |       |       |       |         |         |                          |
                                            +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                            | 1xyz9 |     |       |       |       |       |         |         |                          |
                                            | 1xYz9 |     |       |       |       |       |         |         |                          |
                                            | 1xyZ9 |     |       |       |       |       |         |         |                          |
                                            | 1xYZ9 |  *  |       |       |       |       |         |         |                          |
                                            | 1Xyz9 |     |       |       |       |       |         |         |                          |
                                            | 1XYz9 |     |       |       |       |       |         |         |                          |
                                            | 1XyZ9 |     |       |       |       |       |         |         |                          |
                                            | 1XYZ9 |     |       |       |       |       |         |         |                          |
                                            +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                            
                                            
                                            +-------+------------------------------------------------------------------------------------+
                                            |       |       Option "Match case"  OFF       and       Regex, below, PRECEDED by (?-i)     |
                                            | Text  |-----+-------+-------+-------+-------+---------+---------+--------------------------|
                                            |       | xYZ | xYZ\> | xYZ\b | \<xYZ | \bxYZ | \<xYZ\> | \bxYZ\b | (^|(?<=\W))xYZ((?=\W)|$) |
                                            +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                            | xyz   |     |       |       |       |       |         |         |                          |
                                            | xYz   |     |       |       |       |       |         |         |                          |
                                            | xyZ   |     |       |       |       |       |         |         |                          |
                                            | xYZ   |  *  |   *   |   *   |   *   |       |    *    |    *    |            *             |
                                            | Xyz   |     |       |       |       |       |         |         |                          |
                                            | XYz   |     |       |       |       |       |         |         |                          |
                                            | XyZ   |     |       |       |       |       |         |         |                          |
                                            | XYZ   |     |       |       |       |       |         |         |                          |
                                            +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                            | 1xyz  |     |       |       |       |       |         |         |                          |
                                            | 1xYz  |     |       |       |       |       |         |         |                          |
                                            | 1xyZ  |     |       |       |       |       |         |         |                          |
                                            | 1xYZ  |  *  |   *   |   *   |       |       |         |         |                          |
                                            | 1Xyz  |     |       |       |       |       |         |         |                          |
                                            | 1XYz  |     |       |       |       |       |         |         |                          |
                                            | 1XyZ  |     |       |       |       |       |         |         |                          |
                                            | 1XYZ  |     |       |       |       |       |         |         |                          |
                                            +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                            | xyz9  |     |       |       |       |       |         |         |                          |
                                            | xYz9  |     |       |       |       |       |         |         |                          |
                                            | xyZ9  |     |       |       |       |       |         |         |                          |
                                            | xYZ9  |  *  |       |       |   *   |       |         |         |                          |
                                            | Xyz9  |     |       |       |       |       |         |         |                          |
                                            | XYz9  |     |       |       |       |       |         |         |                          |
                                            | XyZ9  |     |       |       |       |       |         |         |                          |
                                            | XYZ9  |     |       |       |       |       |         |         |                          |
                                            +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                            | 1xyz9 |     |       |       |       |       |         |         |                          |
                                            | 1xYz9 |     |       |       |       |       |         |         |                          |
                                            | 1xyZ9 |     |       |       |       |       |         |         |                          |
                                            | 1xYZ9 |  *  |       |       |       |       |         |         |                          |
                                            | 1Xyz9 |     |       |       |       |       |         |         |                          |
                                            | 1XYz9 |     |       |       |       |       |         |         |                          |
                                            | 1XyZ9 |     |       |       |       |       |         |         |                          |
                                            | 1XYZ9 |     |       |       |       |       |         |         |                          |
                                            +-------+-----+-------+-------+-------+-------+---------+---------+--------------------------+
                                            

                                            From these results, we can deduce than the N++ Boost regex engine lacks to match 4 ( or 8 ) cases, ONLY IF the four conditions, below, occur, simultaneously :

                                            • The Match case option, of the Find dialog, is ON

                                            • A (?i) modifier starts the regex

                                            • The regex begins with the \< assertion

                                            • The text, to match, begins with an UPPER letter


                                            Luckily :

                                            • The regex \<xYZ can be changed by the regex \bxYZ

                                            • The regex \<xYZ\> can be changed by the regex \bxYZ\b

                                            And note that :

                                            • The assertion \< may be replaced by the assertion (^|(?<=\W))

                                            • The assertion \> may be replaced by the assertion ((?=\W)|$)

                                            Cheers

                                            guy038

                                            BTW, Claudia, when you said :

                                            I agree, I assumed too, that using the modifiers overrules the flags but with the regex
                                            you used we see it isn’t.

                                            I disagree ! Indeed :

                                            Considering the text below :

                                            xyz
                                            xYz
                                            xyZ
                                            xYZ
                                            Xyz
                                            XYz
                                            XyZ
                                            XYZ
                                            

                                            Of course, the two regexes (?i)\<xYZ and (?i)\<xYZ\>, with the Match case option ON, match the first four cases, only

                                            But, the two regexes (?i)\<XYZ and (?i)\<XYZ\>, with the Match case option ON, ALSO match the first four cases, only !

                                            So, I do think that the in-line modifiers (?i) and (?-i) have ALWAYS priority over the Match case option

                                            And, like you, I rather think that it’s just a bug [ in the implementation ] of the Boost regex engine !

                                            Besides, the two regexes (?i)\bxYZ and (?i)\bxYZ\b, with the Match case option ON, correctly match the eight cases, above, of the string “xyz” :-)) => The (?i) modifier forces the insensitive search !

                                            1 條回覆 最後回覆 回覆 引用 1
                                            • 第一個貼文
                                              最後的貼文
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors