Community
    • Login

    Join lines from clipboard data

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    14 Posts 4 Posters 826 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by guy038

      Hello, @alan-kilborn and All,

      I was intrigued with the statement :

          editor3h  # third editor, hidden
      

      Do you mean that editor3h is an object which represents a virtual document, where you can perform any manipulation, independently from editor1 representing the main view document and editor2 representing the secondary view document ?

      Now , I have no problem with the commands below, because it’s obvious that, after an S/R, wherever this data is located, you need to select the modified results, copy them in the clipboard and paste them somewhere ( At the insertion point of the JSON file in this specific example )

          editor3h.rereplace('\r\n', ' ')
          editor3h.selectAll()
          editor3h.copy()
          editor.paste()
      

      But, as you said that the first step was to copy PDF file contents ( …in the clipboard ), I don’t see the purpose of the first editor3h.selectAll() command. To my mind, the editor3h.paste() command, which would copy the PDF contents of the clipboard to the virtual editor document, should be enough !?

      From a Python beginner ;-))

      Best Regards,

      guy038

      Alan KilbornA 1 Reply Last reply Reply Quote 1
      • Alan KilbornA
        Alan Kilborn @guy038
        last edited by Alan Kilborn

        @guy038 said in Join lines from clipboard data:

        Do you mean that editor3h is an object which represents a virtual document, where you can perform any manipulation, independently from editor1 representing the main view document and editor2 representing the secondary view document ?

        Short answer to long question:
        Yes. :-)

        I don’t see the purpose of the first editor3h.selectAll() command

        To my mind, the editor3h.paste() command, … should be enough !?

        You are probably only thinking about the first time the script is run.
        On first running, editor3h does not exist, so it is created.
        At that point there is no text in its document, so the select-all does effectively nothing.
        On second and further runs of the script, editor3h already exists – we don’t need to have N++ go thru the overhead of create a new one each time, we can just reuse the old one.
        But in this case, there is already text in the document (from the previous run).
        Selecting all of the text and then doing a paste effectively causes editor3h to then only contain the data from the paste.
        I could have done it in other ways, e.g., editor3h.setText("") as well.

        BTW, I could have dispensed with the editor3h technique altogether and just pasted into the user’s current document and manipulated from there. There are a lot of ways to do things (TMTOWTDI). But that way tends to get a bit messier because when you do an editor.paste() and you need to manipulate what you pasted, the size of what you pasted – so you know where to find the data in the doc – is not provided automatically and takes some effort to calculate. Now if text remained selected upon pasting, it would be easy.

        1 Reply Last reply Reply Quote 2
        • guy038G
          guy038
          last edited by

          Hi, @Alan-kilborn and All,

          My God ! Of course, I didn’t think about successive runs ! And it’s quite logical :

          • The editor3h.selectAll() command, first, select all contents of editor3h, if any

          • The editor3h.paste() command replaces this possible selection with the clipboard’s contents and paste it in editor3h

          Thanks,also, for your additional information :

          BR

          guy038

          Alan KilbornA 1 Reply Last reply Reply Quote 1
          • Alan KilbornA
            Alan Kilborn @guy038
            last edited by Alan Kilborn

            @guy038

            :-)

            Perhaps this editor3h technique could help you with the regex stuff you do. A short script is a probably easier to work with than macros (for those cases where you are building up several regex replacements in a row). The above script seems to contain all the examples you’d need.

            One thing to be careful of is, when using Python strings to contain regexes: Sometimes you want “raw” strings, and sometimes plain strings. I used a plain string above, with '\r\n', which encodes a CR and a LF (because the backslash is an “escape”). Often with regexes, though, you want backslashes to actually be themselves, in which case you do your string with an r out front, e.g. r'...' or r"...".

            These should be equivalent, but which one is easier to read when you’d rather be thinking about regex content, rather than proper backslashing?:

            myregex1 = "\\(.*?\\)"  # find literal ( followed by some chars followed by literal )
            myregex2 = r"\(.*?\)"  # same, but easier to read and think about
            
            1 Reply Last reply Reply Quote 2
            • Alan KilbornA
              Alan Kilborn
              last edited by

              I used a plain string above, with ‘\r\n’, which encodes a CR and a LF (because the backslash is an “escape”)

              Probably that was a bad example, to illustrate my point about raw strings versus not, in Python. Why?

              Because editor.rereplace('\r\n'), ' ' and editor.rereplace(r'\r\n', ' ') will do exactly the same thing (for different reasons) when run.

              In the first case, without the r, the literal characters for carriage-return and line-feed will be searched for; in the second case, they are first interpreted – as regular expression \r and \n, respectively. In the end it is the same effect, but…bad example for illustrating the point.

              Also, I probably should have used r'\R', anyway – and that one is really obvious that we are handing over interpretation to the regex engine, because there is no such \R character!

              1 Reply Last reply Reply Quote 2
              • guy038G
                guy038
                last edited by guy038

                Hi, @alan-kilborn,

                As you just spoke about normal and raw strings, this made me remember of something weird in Python world !

                I used your script, below, to test the statistics of some files, included .exe ones

                https://community.notepad-plus-plus.org/post/61801

                In the initial script, the search regex is simply '\w+'. But I needed this regex '(\x00?[A-Za-z0-9_])+' and it did not work. I had to use raw string, so the regex r'(\x00?[A-Za-z0-9_])+' !


                For instance,

                • In a new tab, type the simple string @ABCDE

                • Run the tiny script, below, executed from the console ( May be, Alan, it’s not a good Python construction but it works !! )

                console.clear() ; editor.research ('(\x40?\w)+', lambda m: console.write (m.group(0)))
                

                It correctly write the string @ABCDE on the console

                • Now, change the @ char with the NUL char, with the help of the Character Panel

                • Run this similar script :

                console.clear() ; editor.research ('(\x00?\w)+', lambda m: console.write (m.group(0)))
                

                This time, we get an error !

                Then, run this third try, using a raw string :

                console.clear() ; editor.research (r'(\x00?\w)+', lambda m: console.write (m.group(0)))
                

                Wow ! It works as expected and displays the NUL char followed with the string ABCDE, on the console

                I hope, I’m not disturbing you too much ;-))

                See you later,

                Cheers ( almost the right moment, in France ! )

                guy038

                Alan KilbornA 1 Reply Last reply Reply Quote 1
                • Alan KilbornA
                  Alan Kilborn @guy038
                  last edited by

                  @guy038

                  Without looking into your specific examples just now – I’ll do that later, after my own “Cheers!” moments – I’ll say:

                  If you follow the rules, you get what is expected. If you don’t follow the rules, you get “unexpected” behavior – which could sometimes be what you expect, or often not!

                  The approximate rules are:

                  • If you are going to use a regular string, i.e., "..." and that string is going to contain a literal backslash, you must double that backslash (to escape it).

                  • If you are going to use a “raw” string, i.e., r"...", then you do NOT (typically!) double any backslashes. This would be the preferred way to do it for someone that writes a lot of literal paths in their source code, or someone that works heavily with regex.

                  Of course, there’s a bit more to the story, but that will come in a later posting…

                  There’s also “byte strings” and “unicode strings”, denoted b"..." and u"..." respectively, but these are mainly (but not always) Python3 things, and Python3 for PythonScript is still in beta, so we won’t really discuss them.

                  Also note that we aren’t going too far off the topic of N++ with this, just enough for scriptwriters – definitely “on-topic”.

                  1 Reply Last reply Reply Quote 2
                  • Alan KilbornA
                    Alan Kilborn
                    last edited by Alan Kilborn

                    @guy038

                    So back to your examples.
                    It all does make sense.
                    Here’s how:

                    The first example is '(\x40?\w)+'.

                    What happens here is that Python encodes \x40 as a single character and passes it to the research function. Because \x40 is @ it works.

                    Had you done, r'(\x40?\w)+' instead, Python would NOT have touched the \x40 part, and would have passed it as 4 distinct characters. The research function would have passed these same 4 characters into the Boost regex engine, and inside there it would have seen the notation \x40 and converted 4 characters into one, equivalent to @. So the difference is WHERE the encoding of the four characters is done.

                    Thus, for the first example, it is successful either way, with a normal Python string or a raw string. It just gets there by different routes.

                    The second example is '(\x00?\w)+'. What happens here is that the \x00 gets encoded into a single NUL byte at Python decode time, and because (often) C strings are NUL-terminated, meaning the NUL signals the end of the string, what happens is that only a single ( – what comes before the NUL in your string – is seen by the regex engine. The regex engine returns that that is clearly a badly formed regex, and says:

                    RuntimeError: Unmatched marking parenthesis ( or \(.

                    When we move to the third example (related to the second), specifically: r'(\x00?\w)+', you are again shifting the decoding to late in the game – when Boost gets hold of it. Thus, here Boost sees (\x00?\w)+ – each individual character in that – and things work. The string won’t get NUL-terminated early. Boost gets a chance at the whole string.

                    Hopefully this makes sense.

                    The bottom line is that Python will use some of the same “escape sequences” that regex will use, if you give it a chance to. By “give it a chance to”, I mean by using a plain string with the syntax "..." (note, no little r preceding).

                    If you want Python to leave your strings alone, so that they are passed fully as typed, to something else, e.g., the regex engine, wrap them as a raw string, i.e., r"...".

                    So, what happens to the \w in the first example, when Python sees it? There is no r out front, so Python scans the string and can do things to it. Well, the answer is that Python does not know what \w is, so that it sends it on as 2 characters, \ and w. Thus this goes on to Boost just as intended.

                    There’s yet another caveat. In a r"..." string, normally any \ are just literal characters, but if you want your final character in the string to be a backslash, you must double it! Thus if you type a=r'abc\' at the console, you’ll get an error:

                    SyntaxError: EOL while scanning string literal

                    so you’d need to do:

                    a = r'abc\\'

                    instead. This most often comes up when specifying a directory, example:

                    mydir = r"c:\mydir1\mysubdir2\\"

                    So just some explanation about this; hopefully it helps in some way.

                    1 Reply Last reply Reply Quote 2
                    • guy038G
                      guy038
                      last edited by

                      Hi, @alan-kilborn and All,

                      Ah, very interesting insight, indeed ;-))

                      Yes, it’s not sometimes easy to know when characters are interpreted, if different nested processes are involved !

                      So, globally, in most situations, it seems better to always use raw strings. I mean… when using regexes, of course

                      BR

                      guy038

                      Alan KilbornA 1 Reply Last reply Reply Quote 2
                      • Alan KilbornA
                        Alan Kilborn @guy038
                        last edited by

                        @guy038 said in Join lines from clipboard data:

                        it seems better to always use raw strings. I mean… when using regexes, of course

                        I would say this is true.

                        Probably most normal Pythoners don’t use a lot of backslashes in their strings, so the simple "..." syntax works fine.

                        And you certainly can use "..." with regex as well, but just remember to “double up” every backslash if you do. (But that can make your regexes a nightmare to look at).

                        1 Reply Last reply Reply Quote 2
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors