Community
    • Login

    Replace certain text in lines with each line another file.

    Scheduled Pinned Locked Moved General Discussion
    search & replace
    16 Posts 4 Posters 9.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by guy038

      Hello, @gideon-addai, @scott-sumner, @terry-r and All,

      I think I found out a solution, which needs ONE click, only, of the Replace All button :-)) So ,

      • Copy/paste your data file in N++ new 1 tab

      • Again, copy and paste your data file in new 2 tab

      • Let’s suppose we just have this part of your file, below, as sample :

      str/4
      <</Contents(100 cups)/(Date)
      Colour red
      <</Contents(080 bowls)/(Date)
      Status used
      Pack team
      <</Contents(200 John)/(Date)
      School house
      
      • Then, in new 2 tab, open the Replace dialog ( Ctrl + H )

      • Select the Regular expression search mode

      • Tick the Wrap around option

      SEARCH (?-si)^<</Contents\((.+)\).+|.+\R

      REPLACE ?1\1

      • Click on the Replace All button

      You should get, in new 2 tab, the list of all the strings located between the strings <</Contents( and )/(Date)

      100 cups
      080 bowls
      200 John
      
      • Now, in new 1 tab, after a separation line ( I personally chose a line of tildes ), append all the new 2 contents

      • Then, copy/paste your list of new strings, in new 3 tab

      • And append all new 3 contents at the very end of new 1 tab, after, again, a separation line, with tildes

      You should see, in new 1 tab, the following text :

      str/4
      <</Contents(100 cups)/(Date)
      Colour red
      <</Contents(080 bowls)/(Date)
      Status used
      Pack team
      <</Contents(200 John)/(Date)
      School house
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      100 cups
      080 bowls
      200 John
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Tree house
      Colon format
      Same variable
      
      • Finally, open, again the Replace dialog and perform, in the new 1 tab, the regex S/R, below :

      SEARCH (?-s)(?<=Contents\()(.+?)(?=\)/(?:(?s).+)\1\R(?:.+\R){3}(.+))|(?s)^~~~.+

      REPLACE ?2\2

      After a single click, on the Replace All button ( Do not use the Replace button ! ), you should get your expected text :

      str/4
      <</Contents(Tree house)/(Date)
      Colour red
      <</Contents(Colon format)/(Date)
      Status used
      Pack team
      <</Contents(Same variable)/(Date)
      School house
      

      Voila !

      Notes :

      • First, note the parentheses symbols, ( and ), must be escaped with the \ symbol to be considered as literals !

      • As usual, at beginning, the (?-s) modifier means that the dot . character will match any single standard character

      • The main search , (.+?), stands for the shortest non-null range of characters, stored as group 1 for later use, but ONLY IF two conditions are true :

        • It must be preceded with the string Contents\(, due to the the look-behind (?<=Contents\()

        • It must match the regex \)/(?:(?s).+)\1\R(?:.+\R){3}(.+), due to the look-ahead (?=\)/(?:(?s).+)\1\R(?:.+\R){3}(.+))

          • The string )/, due to the regex part \)/

          • The longest multi-lines non-null range of characters, due to the non-capturing group (?:(?s).+)

          • Till the group 1 contents, ( your initial string between parentheses ), with its end of line, due to the syntax \1\R

          • Then followed with 3 complete lines, with the non-capturing group (?:.+\R){3}. Of course, if your list of strings, to modify, contains, let’s say, 25 values, just write (?:.+\R){25}, instead !

          • And, finally, followed with the corresponding new string, which have to be placed between parentheses, stored as group 2, due to the (.+) regex

      • Now, after all the initial strings are read, the regex engine tries the second alternative of the search regex, after the | symbol, that is to say the regex (?s)^~~~.+, which grabs the longest range of any character, from the first separation line till the very end of the file, due to the (?s) modifier which means that dot . matches any single character ( standard and EOL one )

      • In replacement :

        • If the first alternative of the search regex matched, the group 2 exists and the initial string is changed into the new corresponding string, due to the conditional replacement ?2\2

        • If the second alternative matched, all text, beginning at the first separation line, is deleted, as group 2 does not exist !

      Again, remember that the number 3, between braces, near the end of the search regex, must be changed with the exact number of “strings” to modify, in your text

      Cheers,

      guy038

      1 Reply Last reply Reply Quote 4
      • Scott SumnerS
        Scott Sumner
        last edited by

        So @guy038’s solution is a good one, and I like that when it is finished executing, there is no “clean-up” of extraneous data, as well as it requires only one Replace All action. It only has 2 minor drawbacks: it requires more data preparation, and it requires a count to be embedded (reference the {3} part).

        I’ll propose the following solution (drawback: requires multiple Replace All actions); prepare data as follows, all in one file (put lines of replacement data at the top, followed by a line of 10 dashes, followed by the main file data):

        Tree house
        Colon format
        Same variable
        ----------
        str/4
        <</Contents(100 cups)/(Date)
        Colour red
        <</Contents(080 bowls)/(Date)
        Status used
        Pack team
        <</Contents(200 John)/(Date)
        School house
        

        Do the following:

        Invoke Replace dialog (default key: ctrl+h)
        Find what zone: (?-i)(?:(?<group1>(?-s).+)(?<group2>(?s).+?-{10}.+?Contents\()\d+ \w+)|(?:\R+-{10}\R)
        Replace with zone: $+{group2}$+{group1}
        Wrap around checkbox: ticked
        Search mode selection: Regular expression
        Action: Press Replace All button repeatedly (or hold down it’s Alt+a accelerator) until you see Replace All: 0 occurrences were replaced.

        Note: I used named groups (“group1”, “group2”) here just for an added twist on regex…I didn’t come up with great names for the groups, however!

        Perhaps it is interesting to actually see the steps the replacement goes through to transform the data…it’s short…so:

        First match (top) and post-first-match replacement (bottom):

        Imgur

        Note: “group1” is the initial yellow color in the top part, “group2” is the brown color, “group3” is the final yellow color in the top part

        Second match (top) and post-second-match replacement (bottom):

        Imgur

        Third match (top) and post-third-match replacement (bottom):

        Imgur

        Fourth match (top) and post-fourth-match replacement (bottom) – this is merely a “clean-up” step:

        Imgur

        1 Reply Last reply Reply Quote 1
        • guy038G
          guy038
          last edited by guy038

          Hello, @gideon-addai, @scott-sumner, @terry-r and All,

          Ah, Scott, I was about to post when I just saw your reply. Interesting use of named groups, both in search and replacement ! I agree that my regex needs more data preparation ! Regarding the second drawback, you’re teasing me aren’t you :-)) Not very difficult to get the line number of the last line of a file, in the status bar !

          So, here is, below, a second version of the regex S/R, of my previous post, where I, simply, change the logic of the modifiers :

          SEARCH (?s)(?<=Contents\()(.+?)(?=\)/.+\1\R(?:[^\r\n]+\R){3}([^\r\n]+))|^---.+

          REPLACE ?2\2

          Notes :

          • Now, the (?s) modifier is set, by default. That is to say that the dot . can match, absolutely, any single character

          • In that case, of course :

            • The regex (?-s).+\R must be changed as (?s)[^\r\n]+\R

            • The regex (?-s).+ must be changed as (?s)[^\r\n]+


          BTW, I found out, in the list of regex tools, referenced on the www.rexegg.com site, an amazing on-line tool, at :

          https://www.debuggex.com/

          • Choose the PCRE flavor, instead of the default JavaScript one

          • Then, in the Regular Expression editor area, paste your regex, to test

          • And, in the Text String Editor area, paste your test data

          For instance, see my second regex version, above, at the address, below :

          https://www.debuggex.com/r/LUBgLokvpxHUJnrv

          Astonishing, isn’t it ?

          IMPORTANT :

          • Before pasting your regex, in Debuggex site :

          • Change any syntax \R into \n

          • Change any syntax \r into \n

          • Change any syntax \r\n into \n

          Hence, paste the regex (?s)(?<=Contents\()(.+?)(?=\)/.+\1\n(?:[^\r\n]+\n){3}([^\r\n]+))|^---.+


          I’ll probably test some regex tools and, soon, I’ll add some links to the main tools, in the FAQ Desk post, relative to regex documentation :-))

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 2
          • Gideon AddaiG
            Gideon Addai
            last edited by

            Thanks to you guys for all the helpful comments. @Terry R, the entries am changing are many, a thousand maybe. There is this other method I tried that worked just fine. Though it kinda included quiet a handful of steps.

            It mainly involves using the Alt+Shift selection method and TextFX plugin. The aim is to try and get all lines with "<</Contents(…)/(Date) following each other, replace everything within “Contents(” and “)” with a character, and replace that character with text from second file and sort everything in it original order.

            Insert the line number for each line: This will lead us back to the original order.

            1. Select the first character in each line using Alt+Shift and arrow keys.

            2. Select TextFX>>TextFX Tools>>T:Insert line numbers. The result looked like this;

                              00000001 str/4
                              00000002 <</Contents(100 cups)/(Date)
                              00000003 Colour red
                              00000004 <</Contents(080 bowls)/(Date)
                              00000005 Status used
                              00000006 Pack team
                              00000007 <</Contents(200 John)/(Date)
                              00000008 School house
              

            Sort original data in ascending order: This will get the lines <</Contents(…)/(Date) following each other
            3. Repeat step 1 for the orginal data (not the inserted line numbers)
            4. Select TextFX>>TextFX Tools>>T:Sort lines case insensitive (at column). The result looked like this

                              00000004 <</Contents(080 bowls)/(Date)
                              00000002 <</Contents(100 cups)/(Date)
                              00000007 <</Contents(200 John)/(Date)
                              00000003 Colour red
                              00000006 Pack team
                              00000008 School house
                              00000005 Status used
                              00000001 str/4
            

            Search and Replace everything between “Contents(” and “)” with a character (0 maybe): This will help us easily replace text from the second file (ie. the content of “Contents(” and “)” don’t contain the same number of characters)
            5. Press Ctrl+H
            6. Search (nts\(.*?\)/)} and replace with nts\(0\)/ with regular expression checked (might be a better regex to do this though). The result looked like this

                            00000004 <</Contents(0)/(Date)
                            00000002 <</Contents(0)/(Date)
                            00000007 <</Contents(0)/(Date)
                            00000003 Colour red
                            00000006 Pack team
                            00000008 School house
                            00000005 Status used
                            00000001 str/4
            

            Replace all 0 with text from second file
            7. Use Alt+Shift+arrow keys, to select all columns in the second file (Don’t use Ctrl+a) and copy.
            8. Select all 0 between “Contents(” and “)” like you did in step 1 and 3
            9. Press Ctrl+v to paste all-column-copied text from second file. The result looked like this

                           00000004 <</Contents(Tree house)/(Date)
                           00000002 <</Contents(Colon format)/(Date)
                           00000007 <</Contents(Same variable)/(Date)
                           00000003 Colour red
                           00000006 Pack team
                           00000008 School house
                           00000005 Status used
                           00000001 str/4
            

            Select and sort the numbers from step 2 in ascending order: This will restore the original order
            10. Using the same Alt+Shift+arrow keys approach, select all the beginning numbers
            11. Select TextFX>>TextFX Tools>>T:Sort lines case insensitive (at column). The result looked like this;

                             00000001 str/4
                             00000002 <</Contents(Colon format)/(Date)
                             00000003 Colour red
                             00000004 <</Contents(Tree house)/(Date)
                             00000005 Status used
                             00000006 Pack team
                             00000007 <</Contents(Same variable)/(Date)
                             00000008 School house
            

            Finally delete all beginning numbers and space: This will restore the orginal format
            12. Use the same Alt+Shift+arrow keys to select all number and space columns and delete

            It involves a number of steps like I said and probably might not be the best approach, but I hope the idea would help someone.

            1 Reply Last reply Reply Quote 4
            • Terry RT
              Terry R
              last edited by

              @Gideon-Addai , see what you’ve done to the team @Notepad++ community (@Scott-Sumner, @guy038)! Your problem, on top of the one I had previously provided a solution for has really set the team off. We’re all trying to outdo each other now with the BEST regex to solve this problem.;-D

              Of course as we all realize, Notepad++/regex was never designed for such a task (without plugins or lots of preperation). And whilst we can make it do it (in a round about way), in all seriousness I think a script, be it Python or something else would do the job much easier. But therein lies the rub. To create a script requires an even greater learning curve. Not many are willing to take that step, if at all they can remain within the “REGEX” world!

              Terry

              PS just wait @guy038 and @scott-sumner, I’ve got the gloves off now. An idea is percolating within the grey matter. It’s a bit far fetched but I’ll follow it as far as it will take me.

              Scott SumnerS 1 Reply Last reply Reply Quote 1
              • Terry RT
                Terry R
                last edited by

                Another solution

                1. First tab has the original file
                2. Second tab has the replacement strings, 1 per line
                3. For each of the tabs (1 & 2) select the left most position in line 1. Select Edit, Column editor. Select text to add and type in a space, then again select column editor and for the first tab select number using 1, increase by 2 (leading zeros ticked). For the second tab select 2, increase by 2 (leading zeros ticked). Both tabs have increasing numbers seperated by a space from the original text on each line.
                4. For the first tab, Search, Mark. Tick “Bookmark line” and type in the character string that determines it is a line needing (partial) replacement.
                5. Create a new tab (third tab) and cut the bookmarked lines to it, Search, Bookmark, Cut bookmarked lines. Select Tab #3 and paste.
                6. Repeat step 3 on tab #3, adding first a space, then a number, again start at 1, increase by 2 (leading zeros ticked).
                7. Copy the contents of tab #2 to the bottom of tab#3.
                8. Edit, Line Operations, Sort the lines as integer ascending. Add a blank line at bottom.
                9. It is a simple matter to merge the 2nd line of each pair into position in the first line, and removing the first number on the first line of the pair.
                10. Copy the contents of tab #3 back to tab #1.
                11. Edit, Line Operations, Sort the lines as integer ascending.
                12. Remove the numbers and space at the start of each line.

                For the example provided the regex’s are:
                #9 Find what: \d+\h(\d+\h)(.+?Contents\().+?(\).+?\R)\d+\h(.+)\R
                Replace with: \1\2\4\3
                #12 Find what: \d+\h
                Replace with: empty field here

                So is it any better than @guy038 solution? I think it is marginally. First off it does not require finding out the number (although to be fair as @guy038 said, just look at the number of the last line). It does look a bit less complex as no need to have the original and replacement text “linked” so number of lines between remains constant, see “You should see, in new 1 tab, the following text :” and the image.
                The main regex is a lot neater, less complex. The 2nd regex is very basic.
                It also does not require any PlugIn as per OP’s own solution using TextFX, so more portable. And the OP’s solution also had 12 steps, @guy038 had 11.

                This was not my original idea as per previous post, but I happened to think of this along the way. So I may (or NOT) yet provide another alternative.

                Not yet tested is how to redesign this when the original text to be replaced covers multiple lines!

                Terry

                1 Reply Last reply Reply Quote 2
                • guy038G
                  guy038
                  last edited by guy038

                  Hello, @gideon-addai, @scott-sumner, @terry-r and All,

                  @gideon-addai, your method, involving the TextFX plugin just needs a lot of manipulations. But if you are just used to TextFX, why not !


                  @terry-r, I think that your regex, at step #9, in your last post, could be shortened a bit, as below :

                  SEARCH (?-s)\d+\h(.+?\().+?(\).+)\R\d+\h(.+)

                  REPLACE \1\3\2

                  So, from the text, in tab #3 :

                  1 03 <</Contents(100 cups)/(Date)
                  2 Tree house
                  3 07 <</Contents(080 bowls)/(Date)
                  4 Colon format
                  5 13 <</Contents(200 John)/(Date)
                  6 Same variable
                  

                  It would give :

                  03 <</Contents(Tree house)/(Date)
                  07 <</Contents(Colon format)/(Date)
                  13 <</Contents(Same variable)/(Date)
                  

                  Best Regards,

                  guy038

                  1 Reply Last reply Reply Quote 1
                  • Scott SumnerS
                    Scott Sumner @Terry R
                    last edited by

                    @Terry-R said:

                    We’re all trying to outdo each other now with the BEST regex to solve this problem

                    No…no…no…please let’s NOT do this here. Maybe on a forum strictly dedicated to regex, but not here.

                    For the record, I don’t post to present a “better” regex to solve a problem that has already been solved. I might, however, post to show a new twist on something…in this case the technique to hold down a keycombo to repeatedly run a Replace All…often a very effective technique that makes writing the necessary regex faster and less complicated. It’s not a brand-new twist (running repeated Replace Alls), but it doesn’t get discussed a huge amount here.

                    Also, in this case, I had my solution mostly composed already when @guy038 submitted his post first…it would have been disheartening to hit Discard on so much text. :-)

                    I don’t think there’s a lot of value in a “best” regex for typical usage by Notepad++ users. Most simply want a solution to their problem so they can move on. So there is little value in presenting something to a solved problem that shaves a few characters off a regex…unless there is a key point also being made (which should definitely be elaborated on with additional text!). This is not intended to ignore others who might put their regex into a production environment where it will be run all the time–in these cases regex efficiency (one version of “best”) is critical…but again that is not typically the case here. And a final thought: a “best” regex, in terms of said efficiency, is often NOT the shortest one. :-)

                    1 Reply Last reply Reply Quote 0
                    • guy038G
                      guy038
                      last edited by guy038

                      Hello, @gideon-addai, @scott-sumner, @terry-r and All,

                      Scott, when you said :

                      So there is little value in presenting something to a solved problem that shaves a few characters off a regex…

                      I think that it’s… just the case of my last post ;-)) Of course, I agree with you that Terry-r’s regex works nice and that my last contribution was not essential at all ! However, sometimes, I, simply, think that a shorter regex may be more easily understood, for everybody !

                      But, eventually, you’re right : the best regex, in terms of number of matches and minimum execution speed, is not always the shortest one ! And, generally, building a good generic regex, which can suit in many texts, at the same time, is rather difficult to achieve !

                      Finally ,instead of speaking about the concept of “best” regex, we would rather speak about the most suitable regex for each OP :-))

                      Cheers,

                      guy038

                      1 Reply Last reply Reply Quote 2
                      • Terry RT
                        Terry R
                        last edited by

                        Apologies @guy038 and @scott-sumner, my post on “best” regex and “trying to outdo each other” was not meant seriously (;-D). And yes our primary goal here is to help the OP with their problem. I do still think though that we are trying to make our regex’s more efficient through this process. Does that equate to “best”, possibly in a roundabout way? And yes more efficient doesn’t necessarily mean faster, it may mean less complicated for the OP to understand and follow.

                        And I welcome having others critique my regex, thanks @guy038, your “shortened” version of my regex was welcomed. I hadn’t spent a lot of time on the regex, I was mainly concerned with looking for a “more efficient”/“less complicated” solution.

                        Thanks
                        Terry :-))

                        1 Reply Last reply Reply Quote 3
                        • D DankiestCitra referenced this topic on
                        • Terry RT Terry R referenced this topic on
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors