Community
    • 登入

    pythonscript: any ready pyscript to replace one huge set of regex/ phrases with others?

    已排程 已置頂 已鎖定 已移動 Notepad++ & Plugin Development
    58 貼文 10 Posters 27.4k 瀏覽
    正在載入更多貼文
    • 從舊到新
    • 從新到舊
    • 最多點贊
    回覆
    • 在新貼文中回覆
    登入後回覆
    此主題已被刪除。只有擁有主題管理權限的使用者可以查看。
    • Alan KilbornA
      Alan Kilborn @guy038
      最後由 編輯

      @guy038

      Yea, probably a good idea. Trailing blanks are hard to see without having visible line ends turned on (yuck!), or doing them as \x20 or, as you like, a trailing delimiter.

      Glad you are enjoying the script and your own script mods!

      1 條回覆 最後回覆 回覆 引用 2
      • chcgC
        chcg
        最後由 編輯

        Would you like to create a PR of the script to be added to https://github.com/bruderstein/PythonScript/tree/master/scripts/Samples? Otherwise I could also add the last version of @guy038 , if that is ok for you.

        I know the installation of PythonScript with N++ > 7.6.x is right now a horror. Hope i will find some time to get it compatible with PluginAdmin changes. The biggest problem known so far is the move the location of python27.dll into the plugin folder.

        Meta ChuhM 1 條回覆 最後回覆 回覆 引用 3
        • Meta ChuhM
          Meta Chuh moderator @chcg
          最後由 Meta Chuh 編輯

          @chcg

          I know the installation of PythonScript with N++ > 7.6.x is right now a horror.

          i’ve made a little guide and summary of all paths, while being in a chat with peter, for the installed version here

          and one for the portable version here

          maybe you can use it, if you need to help someone.

          The biggest problem known so far is the move the location of python27.dll into the plugin folder.

          i suppose so, unless the plugin spawns a process with a different relative path, not bound to notepad++.exe’s path, or maybe even a static python27 library in the spawn.

          1 條回覆 最後回覆 回覆 引用 2
          • guy038G
            guy038
            最後由 guy038 編輯

            Hi, @alan-kilborn and All,

            I did some tests, with your script and, finally, the Python regex engine seems more reliable than our Boost regex engine ;-))

            Some bugs or limitations, present in our Boost implementation ( see the REMARK section of this FAQ, below )

            https://notepad-plus-plus.org/community/topic/15765/faq-desk-where-to-find-regex-documentation

            do not occur anymore with the Python regex engine ;-))

            Indeed :

            • You can insert, either, in search and replacement regexes, characters, located outside the BMP, directly or with the syntax \x{HHHHHHHH}

            • The NUL character, \x{0000}, can be used, either, in search and replacement regexes

            • The backward assertions, as, for instance, \A, seem correctly supported

            • The Look-behind assertions are correctly handled, even if it overlaps with the end of the previous match


            Seemingly, we’ll just lack, with the Python regex engine, the case modifiers, ( \u, \l, \U, \L and \E )

            These escaped sequences are available, with our Boost engine, in the replacement part. Refer to the address, below :

            https://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html#boost_regex.format.boost_format_syntax.escape_sequences

            For instance, against this text:

            This is simple test
            

            You may test the two regex S/R :

            SEARCH \w+

            REPLACE \u$0

            and

            SEARCH \w+

            REPLACE \U$0 $0\E <$0>

            AFAIK, they do not modify anything, ( I mean regarding case of characters ! ) when executed from a Python script :-((

            Best Regards,

            guy038

            Alan KilbornA 2 條回覆 最後回覆 回覆 引用 3
            • Alan KilbornA
              Alan Kilborn @guy038
              最後由 編輯

              @guy038 said:

              I did some tests, with your script and, finally, the Python regex engine seems more reliable than our Boost regex engine

              Can you show some examples of the Python regex engine testing you did?

              Eko palypseE 1 條回覆 最後回覆 回覆 引用 0
              • Eko palypseE
                Eko palypse @Alan Kilborn
                最後由 Eko palypse 編輯

                @guy038,

                the script provided by @Alan-Kilborn uses the boost regex implementation from the PythonScript plugin, which, as you’ve already shown, is implemented differently than with npp.

                Alan KilbornA 1 條回覆 最後回覆 回覆 引用 0
                • Alan KilbornA
                  Alan Kilborn @Eko palypse
                  最後由 編輯

                  @Eko-palypse

                  Well that’s kinda what I was getting at by asking @guy038 that last question. I couldn’t tell from what he was saying if he was talking about the earlier script or if he had tried some real Python re.xxx functions for search and replace. Hence my question to him.

                  uses the boost regex implementation from the PythonScript plugin which is implemented differently than with npp

                  Is it truly, though? I always thought that it made calls back to whatever regex engine is in N++, but, hmmm, maybe not. Maybe I should check the source code. :)

                  Eko palypseE 1 條回覆 最後回覆 回覆 引用 0
                  • Eko palypseE
                    Eko palypse @Alan Kilborn
                    最後由 Eko palypse 編輯

                    @Alan-Kilborn

                    From what I understand, yes, this is the case, it has the boost:regex engine implemented
                    https://github.com/bruderstein/PythonScript/blob/d54a2b434ec2b51f0dbacd3828fc36a20533c2dc/PythonScript/src/Replacer.cpp

                    1 條回覆 最後回覆 回覆 引用 2
                    • guy038G
                      guy038
                      最後由 guy038 編輯

                      Hi, @alan-kilborn, and All,

                      Alan, it’s just all the points, described in my previous post !


                      You can insert, either, in search and replacement regexes, characters, located outside the BMP, directly or with the syntax \x{HHHHHHHH}

                      From the text below :

                      🍬 = \x{1F36C}
                      🎂 = \x{1F382}
                      🎄 = \x{1F384}
                      🎅 = \x{1F385}
                      🎇 = \x{1F387}
                      🎺 = \x{1F3BA}
                      👼 = \x{1F47C}

                      with the Python regex engine, you can use :

                      SEARCH [\x{0001F36C}-\x{0001F47C}].+ or [\x{1F36C}-\x{1F47C}].+

                      REPLACE \x{1F385} = \\x{1F385}

                      So, with my modified script : @[\x{1F36C}-\x{1F47C}].+@\x{1F385} = \\x{1F385}@

                      and you get:

                      🎅 = \x{1F385}
                      🎅 = \x{1F385}
                      🎅 = \x{1F385}
                      🎅 = \x{1F385}
                      🎅 = \x{1F385}
                      🎅 = \x{1F385}
                      🎅 = \x{1F385}

                      For characters with code, above \x{FFFF}, you cannot do this kind of S/R with our Boost regex engine


                      The NUL character, \x{0000}, can be used, either, in search and replacement regexes

                      For instance, you can execute the following S/R, with the Python regex engine :

                      SEARCH [\x20-\x7f]

                      REPLACE $0\x00

                      giving for the script : @[\x20-\x7f]@$0\x00@

                      This S/R cannot be run with our Boost regex engine, which just deletes all the characters


                      The backward assertions, as, for instance, \A, seem correctly supported

                      Just imagine the text “This is a test” in a new N++ tab and the regex S/R :

                      SEARCH \A.

                      REPLACE -

                      So, in the script, the syntax @\A.@-@

                      With the Python regex engine, we get the correct text -his is a test ! With our Boost regex engine, after clicking on the Replace All button, we, wrongly, obtain the text -------------- :-((


                      The Look-behind assertions are correctly handled, even if it overlaps with the end of the previous match

                      Consider the text aaaabaaabaaa and the regex S/R :

                      SEARCH (?<=a)ba+

                      REPLACE 123a

                      => the syntax @(?<=a)ba+@123a@, in the script

                      With the Python regex engine, the text is correctly modified as aaaa123a123a ( two S/R ) whereas, with the Boost regex engine, after clicking on the Replace All button, we get the wrong string aaaa123abaaa

                      Indeed, the second match never occurs, as it should have seen that the last char of replacement a was right before the baaa string, hence a second match :-((

                      Cheers,

                      guy038

                      Eko palypseE 1 條回覆 最後回覆 回覆 引用 2
                      • Eko palypseE
                        Eko palypse @guy038
                        最後由 編輯

                        @guy038

                        are you really using the python regex engine?
                        This would mean you have some code like re.sub(pattern, repl, string, count=0, flags=0)
                        but the snippet you showed earlier uses editor.rereplace which is supposed to be the boost regex engine.

                        Alan KilbornA 1 條回覆 最後回覆 回覆 引用 0
                        • guy038G
                          guy038
                          最後由 編輯

                          Hi, @eko-palypse, @alan-kilborn and All,

                          Huum…, I’m a bit confused ! When I mean : “With the Python regex engine…”, I’m just saying that I did all the tests with the Alan’s script, above, which does use the helper method editor.rereplace ! And, of course, the classical N++ Replace dialog, to compare with.

                          In fact, I’m already aware of this fact, as, some time ago, I noticed differences, while using Scott Sumner’s or Claudia frank’s Python scripts, which dealt, essentially, with searches ! As, this time, we have a nice search and replace script, I just verified that my assumptions were correct : the present behavior of the editor.rereplace method gives improved results and seems to fix some bugs of the current implementation of the Boost library, within Notepad++ :-))

                          But, I’m not a true coder ! So, unfortunately, it’s… up to all of you, to tell me why it’s looks better ;-))

                          Cheers,

                          guy038

                          1 條回覆 最後回覆 回覆 引用 1
                          • Alan KilbornA
                            Alan Kilborn @Eko palypse
                            最後由 編輯

                            @Eko-palypse @guy038

                            So to clarify, when using the Pythonscript plugin, one can do 1 of 2 things:

                            • editor.rereplace() which uses the Boost regex that is very similar to, but maybe not exactly the same as the one directly in N++
                            • use re.sub() which uses the Python regex engine (which is its own thing, not Boost, not PCRE, not ANYTHING except Python’s own re module)

                            So far I believe everything discussed in this thread is using the FIRST one.

                            1 條回覆 最後回覆 回覆 引用 3
                            • Alan KilbornA
                              Alan Kilborn
                              最後由 Alan Kilborn 編輯

                              @guy038 said:

                              When I (say) “With the Python regex engine…”, I’m just saying that I did all the tests with…Alan’s script

                              “With the Python regex engine” would be my SECOND bullet point above, but that is not what you’re doing unless you’ve changed the editor.rereplace() call in the script to a re.sub() call (and slightly changed the other logic to cope with that change).

                              BTW when you import re (to get access to the re.IGNORECASE aka re.I flag) that is all you are doing–getting access to that, which happens to be shared, for convenience, with the Boost regex engine.

                              1 條回覆 最後回覆 回覆 引用 1
                              • Eko palypseE
                                Eko palypse
                                最後由 編輯

                                So from what I get is, that there is a difference in the implementation details of boost:regex in npp and pythonscript plugin.
                                So the best would be if the pythonscript plugin would implement the missing pieces and npp silently steals the code and
                                adapt it to have it work the same ;-)

                                1 條回覆 最後回覆 回覆 引用 3
                                • Alan KilbornA
                                  Alan Kilborn @guy038
                                  最後由 Alan Kilborn 編輯

                                  @guy038 said:

                                  SEARCH \w+

                                  REPLACE \u$0

                                  AFAIK, they do not modify anything, ( I mean regarding case of characters ! ) when executed from a Python script :-((

                                  Interesting. I noticed that the following variant on that above WILL work to affect case when using editor.rereplace() in a script:

                                  Find: (\w+)
                                  Repl: \U\1

                                  It seems like either variant should capitalize all lowercase letters in a document. HOWEVER, only the script version does this! When run interactively with the Replace dialog in Notepad++, these 2 variants only capitalize the first letter of every “word”.

                                  Can anyone offer an explanation for:

                                  • why Guy’s original regex replace does nothing in the script
                                  • why both of these regex replaces only change to uppercase the first letter of every “word” when run with N++ interactive replace (but – and I think act correctly in the script)
                                  Alan KilbornA 1 條回覆 最後回覆 回覆 引用 1
                                  • Alan KilbornA
                                    Alan Kilborn @Alan Kilborn
                                    最後由 編輯

                                    @Alan-Kilborn said:

                                    why both of these regex replaces only change to uppercase the first letter of every “word” when run with N++ interactive replace (but – and I think act correctly in the script)

                                    Let me correct this:

                                    • why both of these regex replaces only change to uppercase the first letter of every “word” when run with N++ interactive replace (but the one that involves capturing group #1 and using \1 in the replace part – acts correctly in the script, at least I think it does)

                                    Hmm, better but maybe still not a great way of expressing it. :-P

                                    Eko palypseE 1 條回覆 最後回覆 回覆 引用 1
                                    • Eko palypseE
                                      Eko palypse @Alan Kilborn
                                      最後由 Eko palypse 編輯

                                      @Alan-Kilborn

                                      If I understand you correctly, I’m totally lost - my setup must have some kind of builtin wizard as
                                      I do get different result. So just to clarify, having the text this is some text and aiming to get
                                      THIS IS SOME TEXT we would use \w+ and replace with \U$0 or (\w+) with \U$1 as replacement.
                                      For me, both work the same in the dialog and none work when called like editor.rereplace('\w+','\U$0') from a script.
                                      But you do have a different result?

                                      Alan KilbornA 1 條回覆 最後回覆 回覆 引用 2
                                      • Alan KilbornA
                                        Alan Kilborn @Eko palypse
                                        最後由 編輯

                                        @Eko-palypse

                                        Wow. WOW. I find I cannot reproduce my earlier results. It seems to be working consistently now (duplicating your results). I guess I have egg on my face and sorry for the false alarm; unless it takes some sort of special sequence of actions to get into a weird mode! I did restart N++ an hour ago so I supposed that is a possible occurrence.

                                        One thing I am seeing now that the editor.rereplace() is doing “nothing”:

                                        I use the LocationNavigate plugin for its ability to specially mark changed lines (wish there was a better/more-current solution to that, btw!). When I run the scripted replace, although visually no text changes, the plugin does mark every line where \w+ matches. Basically this means all lines besides empty ones got “changed”. So…not really sure what under-the-hood voodoo magic is happening when using \U (and probably others like it) with a scripted replace, but it sure seems like SOMETHING interesting might be happening. If what is happening is that the \U is being ignored and abc is simply being replaced by abc perhaps that is not all that interesting. :(

                                        Eko palypseE 1 條回覆 最後回覆 回覆 引用 1
                                        • Eko palypseE
                                          Eko palypse @Alan Kilborn
                                          最後由 編輯

                                          @Alan-Kilborn

                                          :-) … live is interesting, isn’t it … what is true now might be false in the next minute :-)

                                          From modify callback I see that there is a replace but it just ignores the \U

                                          {'code': 2008, 'annotationLinesAdded': 0, 'text': 'this', 'modificationType': 1048576, 'token': 0, 'linesAdded': 0, 'length': 4, 'foldLevelPrev': 0, 'position': 0, 'line': 0, 'foldLevelNow': 0}
                                          {'code': 2008, 'annotationLinesAdded': 0, 'text': 'is', 'modificationType': 1048576, 'token': 0, 'linesAdded': 0, 'length': 2, 'foldLevelPrev': 0, 'position': 5, 'line': 0, 'foldLevelNow': 0}
                                          {'code': 2008, 'annotationLinesAdded': 0, 'text': 'some', 'modificationType': 1048576, 'token': 0, 'linesAdded': 0, 'length': 4, 'foldLevelPrev': 0, 'position': 8, 'line': 0, 'foldLevelNow': 0}
                                          {'code': 2008, 'annotationLinesAdded': 0, 'text': 'text', 'modificationType': 1048576, 'token': 0, 'linesAdded': 0, 'length': 4, 'foldLevelPrev': 0, 'position': 13, 'line': 0, 'foldLevelNow': 0}
                                          
                                          1 條回覆 最後回覆 回覆 引用 3
                                          • chcgC
                                            chcg
                                            最後由 編輯

                                            Added the script as https://github.com/bruderstein/PythonScript/blob/master/scripts/Samples/Multiples_SR.py

                                            1 條回覆 最後回覆 回覆 引用 4
                                            • 第一個貼文
                                              最後的貼文
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors