Community
    • Login

    For regex users: how can I add some lines after some other lines (intersect them)

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    19 Posts 4 Posters 3.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Neculai I. FantanaruN
      Neculai I. Fantanaru
      last edited by

      good day regex users. Have a question for you: how can I add some lines after some other lines ? Basically, I have to intersect them (2 different texts). Line number 1, after line 8, and so on.

      for example:

      My mother is home.
      My father is with my sister.
      I have to go home.
      I need some help.
      God is everywhere.
      My dog is with her cat.
      
      https://sentence number one
      sentence number two
      www.sentence number three
      sentence number four
      sentence number five
      23 sentence number six
      

      My desire output should be:

      https://sentence number one
      My mother is home.
      
      sentence number two
      My father is with my sister.
      
      www.sentence number three
      I have to go home.
      
      sentence number four
      I need some help.
      
      sentence number five
      God is everywhere.
      
      23 sentence number six
      My dog is with her cat.
      
      1 Reply Last reply Reply Quote 0
      • Terry RT
        Terry R
        last edited by

        This post is deleted!
        1 Reply Last reply Reply Quote 0
        • Terry RT
          Terry R
          last edited by

          @Neculai-I-Fantanaru
          Here is my solution

          Have each of the 2 files loaded in seperate tabs within NPP.

          In the first file (containing line 1 etc) put the cursor in the first line at the very start. Use the Edit (menu), Column Editor to firstly add a space (text to insert), then repeat with “number to insert”, starting with 1 and increase by 2, also have “leading zeros” ticked.

          In the second file (containing the 2nd line that corresponds with line 1 from other file) repeat the above steps, but using 2 as the starting number, still increase by 2.

          In both files add a blank line at the end, this can be achieved by using the following regex on each file.
          Find What:\z
          Replace With:\r\n

          At this point using your example data I have the following.
          File 1:

          01 https://sentence number one
          03 sentence number two
          05 www.sentence number three
          07 sentence number four
          09 sentence number five
          11 23 sentence number six
          
          

          and File 2:

          02 My mother is home.
          04 My father is with my sister.
          06 I have to go home.
          08 I need some help.
          10 God is everywhere.
          12 My dog is with her cat.
          
          

          Note there is actually a blank line at the bottom of both example files/tabs (last line).

          As your example states there is to be a blank line following each pair we still have to edit the 2 file/tab contents to achieve this. In this tab use the following regex to do that.
          Find What:^(\d+)( )(.+)(\R)
          Replace With:\1a\2\3\4\1b\4
          This adds an “a” on the first line behind the number sequence and adds a blank line with the same number followed by a “b”. All will be clear shortly.

          Now we will combine them and sort. You can either insert the contents of one file into the other, or make a separate tab and combine them there. So once combined in a tab, use the Edit (menu), Line Operations, Sort Lines lexicographically ascending. This makes the example look like this:

          
          01 https://sentence number one
          02a My mother is home.
          02b
          03 sentence number two
          04a My father is with my sister.
          04b
          05 www.sentence number three
          06a I have to go home.
          06b
          07 sentence number four
          08a I need some help.
          08b
          09 sentence number five
          10a God is everywhere.
          10b
          11 23 sentence number six
          12a My dog is with her cat.
          12b
          

          All that is required now is to clean up the lines removing the number and possible “a” or “b” and the space. So we have a final regex.
          Find What:^\d+[ab]*\h*
          Replace With:empty field <—nothing in this field

          Hey presto, you have the result as you outlined. Just check first and last lines which may still be additional blank lines and remove if necessary.

          Good luck, let us know if issues, perhaps something you had not covered in your request causes issues.

          Terry

          1 Reply Last reply Reply Quote 1
          • Terry RT
            Terry R
            last edited by

            @Neculai-I-Fantanaru
            I had a little trouble with my posting, first one I deleted as it contained incorrect result for 1 tab, now I see I’ve had another mistake. See below.

            @Terry-R said:

            As your example states there is to be a blank line following each pair we still have to edit the 2 file/tab contents to achieve this. In this tab use the following regex to do that.
            Find What:^(\d+)( )(.+)(\R)
            Replace With:\1a\2\3\4\1b\4

            this step ONLY needs doing on the 1 tab that contains the 2nd line etc. So as per the example we’d pick the tab that contains “My mother is home”.

            Sorry about that. Suggest you use your examples to run through these steps once so you understand what is going on before commiting the REAL data to the process.

            Terry

            1 Reply Last reply Reply Quote 1
            • guy038G
              guy038
              last edited by guy038

              Hello @neculai-i.-fantanaru, and All,

              Assuming :

              • Two blocks of non-empty lines of, exactly, n lines, separated with a line of, at least, 3 tildes

              • The first block of lines corresponds to the lines which come in second position, in the two-lines blocks of the final text

              Then, a generic regex S/R could be :

              SEARCH (?-s)^.+\R(?=(?:.+\R){n}(.+(\R)))|(?s)~~~.+

              REPLACE ?1\1$0\2


              So, from the initial text, below :

              My mother is home.
              My father is with my sister.
              I have to go home.
              I need some help.
              God is everywhere.
              My dog is with her cat.
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
              https://sentence number one
              sentence number two
              www.sentence number three
              sentence number four
              sentence number five
              23 sentence number six
              

              As your text contains two blocks of six lines => n = 6 So, the correct regex, in your case, is :

              SEARCH (?-s)^.+\R(?=(?:.+\R){6}(.+(\R)))|(?s)~~~.+

              REPLACE ?1\1$0\2

              After a click on the Replace All button, you’ll get the expected text :

              https://sentence number one
              My mother is home.
              
              sentence number two
              My father is with my sister.
              
              www.sentence number three
              I have to go home.
              
              sentence number four
              I need some help.
              
              sentence number five
              God is everywhere.
              
              23 sentence number six
              My dog is with her cat.
              

              Remark : If the first block of lines corresponds to the lines which come first, in the two-lines blocks, just change the Replace part as below :

              REPLACE ?1$0\1\2

              Cheers,

              guy038


              https://notepad-plus-plus.org/community/topic/16706/for-regex-users-how-can-i-add-some-lines-after-some-other-lines-intersect-them/7

              Hi, @neculai-i.-fantanaru, @terry-r and All,

              You’re quite right, Terry, about that limit. Indeed if we exceed this limit, the regex engine, wrongly, matches all the file contents !

              I did some tests, copying many blocks of text, as below, in order to get a huge block :

              My mother is home.
              My father is with my sister.
              I have to go home.
              I need some help.
              God is everywhere.
              My dog is with her cat.
              

              And it happens that, with my configuration, the limit is 47,146 consecutive lines in a block ( 1,052,927 bytes ):

              My mother is home.                 )
              My father is with my sister.       )
              I have to go home.                 )
              I need some help.                  )  = 7857 blocks of 6 lines = 47,142 lines    ( 1,052,838 bytes )
              God is everywhere.                 )
              My dog is with her cat.            )
              My mother is home.
              My father is with my sister.                                   +      4 lines    (        89 bytes )
              I have to go home.
              I need some help.                                               -------------    -------------------
                                                                               47,146 lines      1,052,927 bytes
              

              Why ?! May be it varies with the length of lines or with the total amount of bytes of a block ? Didn’t dig it out more ;-))

              Cheers,

              guy038

              1 Reply Last reply Reply Quote 1
              • Terry RT
                Terry R
                last edited by

                @guy038
                A question regarding your solution. I was also thinking along those lines but I recalled a conversation in a past posting where you outlined issues when the files became large, specifically around the 21000 line mark. I refer to:
                https://notepad-plus-plus.org/community/topic/16489/delete-both-duplicates-regexp-macro/20?page=1

                Does this regex (you supplied) also suffer from the same issues, if the number of lines is too large? Another posting (which I cannot find right now) I believe also came to a similar conclusion, in that any regex with a lookahead can be overwhelmed by the number of lines and incorrectly select ALL remaining text.

                Terry

                1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by guy038

                  Hi, @neculai-i.-fantanaru, @terry-r and All,

                  You’re quite right, Terry, about that limit. Indeed if we exceed this limit, the regex engine, wrongly, matches all the file contents !

                  I did some tests, copying many blocks of text, as below, in order to get a huge block :

                  My mother is home.
                  My father is with my sister.
                  I have to go home.
                  I need some help.
                  God is everywhere.
                  My dog is with her cat.
                  

                  And it happens that, with my configuration, the limit is 47,146 consecutive lines in a block ( 1,052,927 bytes ):

                  My mother is home.                 )
                  My father is with my sister.       )
                  I have to go home.                 )
                  I need some help.                  )  = 7857 blocks of 6 lines = 47,142 lines    ( 1,052,838 bytes )
                  God is everywhere.                 )
                  My dog is with her cat.            )
                  My mother is home.                 
                  My father is with my sister.                                   +      4 lines    (        89 bytes )
                  I have to go home.
                  I need some help.                                               -------------    -------------------
                                                                                   47,146 lines      1,052,927 bytes
                  

                  Why ?! May be it varies with the length of lines or with the total amount of bytes of a block ? Didn’t dig it out more ;-))

                  Cheers,

                  guy038

                  Scott SumnerS 1 Reply Last reply Reply Quote 2
                  • Terry RT
                    Terry R
                    last edited by

                    @guy038

                    Thanks for doing that quick test. interesting that the limit (in this case) was vastly different from the one in the link I mentioned. it does suggest that we cannot rely on a specific number of lines, above which the lookahead fails.

                    It would seem there is some unknown interaction between complexity of the regex and perhaps the number of characters being worked through. Regardless, not being able to say (with certainty) the issue occurs with “x” number of lines or “y” characters does somewhat diminish the lookahead function as a useful tool.

                    Cheers
                    Terry

                    1 Reply Last reply Reply Quote 0
                    • Scott SumnerS
                      Scott Sumner @guy038
                      last edited by Scott Sumner

                      My opinion is that there isn’t some magical # of lines that can’t be exceeded before this “select all” issue rears its ugly head. It is going to depend upon the regex used and the data it is used upon. All regex engines are going to have their limitations (stack depth, memory buffer sizes, and such things…), and Notepad++ 's is no different. It just so happens is that with Notepad++, instead of the user receiving an appropriate indication that a catastrophic limit has been reached, it just, well…selects all the text to be the “match” and presents that to the user. Ugh.

                      I think our old friend @Claudia-Frank spoke well to this here, which I may have cited before in other threads. Claudia gives a rationale for why the “match” found “starts” at the beginning of file; I’ll wager there is a correspondingly similar reason why the “match” “ends” at the end of file (although I haven’t investigated that–but I may be doing just that…).

                      1 Reply Last reply Reply Quote 1
                      • Neculai I. FantanaruN
                        Neculai I. Fantanaru
                        last edited by

                        @guy038 said:

                        SEARCH (?-s)^.+\R(?=(?:.+\R){6}(.+(\R)))|(?s)---.+
                        REPLACE ?1\1$0\2

                        Your regex is GREAT. Buy there is a little problem, @guy038. See this example:

                        My mother is home.
                        My father is with my sister.
                        I have to go home.
                        I need some help.
                        God is everywhere.
                        My dog is with her cat.
                        -----------
                        https://mywebsite.com/my-link-one.html
                        https://mywebsite.com/my-link-two.html
                        https://mywebsite.com/my-link-three.html
                        https://mywebsite.com/my-link-four.html
                        https://mywebsite.com/my-link-five.html
                        https://mywebsite.com/my-link-six.html
                        

                        After search and replace, (test again regex) I cannot find the last line https://mywebsite.com/my-link-five.html

                        Scott SumnerS 1 Reply Last reply Reply Quote 0
                        • Scott SumnerS
                          Scott Sumner @Neculai I. Fantanaru
                          last edited by Scott Sumner

                          @Neculai-I.-Fantanaru

                          I cannot find the last line https://mywebsite.com/my-link-five.html

                          DId you really mean to say?: “I cannot find the last line https://mywebsite.com/my-link-six.html”

                          If so, I can understand why you cannot find it: In your before-text that line (at end-of-file) probably doesn’t have a line-ending on it. Move to the end of that line and press Enter. Then try @guy038’s replacement operation again.

                          1 Reply Last reply Reply Quote 1
                          • Neculai I. FantanaruN
                            Neculai I. Fantanaru
                            last edited by

                            @guy38 said:

                            SEARCH (?-s)^.+\R(?=(?:.+\R){6}(.+(\R)))|(?s)---.+
                            REPLACE ?1\1$0\2

                            ok, I try again, but another scenario. Suppose there are much more lines. Try this, and you will see the problem.

                            My mother is home.
                            My father is with my sister.
                            I have to go home.
                            I need some help.
                            God is everywhere.
                            My dog is with her cat.
                            My mother is home.
                            My father is with my sister.
                            I have to go home.
                            I need some help.
                            God is everywhere.
                            My dog is with her cat.
                            My mother is home.
                            My father is with my sister.
                            I have to go home.
                            I need some help.
                            God is everywhere.
                            My dog is with her cat.
                            -----------
                            https://mywebsite.com/my-link-one.html
                            https://mywebsite.com/my-link-two.html
                            https://mywebsite.com/my-link-three.html
                            https://mywebsite.com/my-link-four.html
                            https://mywebsite.com/my-link-five.html
                            https://mywebsite.com/my-link-six.html
                            https://mywebsite.com/my-link-one.html
                            https://mywebsite.com/my-link-two.html
                            https://mywebsite.com/my-link-three.html
                            https://mywebsite.com/my-link-four.html
                            https://mywebsite.com/my-link-five.html
                            https://mywebsite.com/my-link-six.html
                            https://mywebsite.com/my-link-one.html
                            https://mywebsite.com/my-link-two.html
                            https://mywebsite.com/my-link-three.html
                            https://mywebsite.com/my-link-four.html
                            https://mywebsite.com/my-link-five.html
                            https://mywebsite.com/my-link-six.html
                            
                            Scott SumnerS 1 Reply Last reply Reply Quote 0
                            • Scott SumnerS
                              Scott Sumner @Neculai I. Fantanaru
                              last edited by

                              @Neculai-I.-Fantanaru

                              So the expectation here, when people provide help is that you read, understand, and learn from what is provided. Sure, if this is a one-time need, I suppose you can just use it blindly and move on, without understanding or trying to learn its application. But this isn’t the case for you, as you are now trying to use the solution provided for a different set of data, without adjusting appropriately the provided solution.

                              Big hint: Go back and read the part where @guy038 says:

                              As your text contains two blocks of six lines => n = 6 So, the **correct **regex, in your case…

                              So see if you can adjust the provided solution to be correct for your new problem.

                              1 Reply Last reply Reply Quote 1
                              • Neculai I. FantanaruN
                                Neculai I. Fantanaru
                                last edited by

                                yes, indeed. But I thought to update my case with a more complex scenario. In the event that some will want something else.

                                Scott SumnerS 1 Reply Last reply Reply Quote 0
                                • Scott SumnerS
                                  Scott Sumner @Neculai I. Fantanaru
                                  last edited by

                                  @Neculai-I.-Fantanaru

                                  So you updated the scenario–fine…can you also present a corrected expression to match that scenario…as a demonstration of what you’ve learned from the help you’ve received?

                                  1 Reply Last reply Reply Quote 0
                                  • Neculai I. FantanaruN
                                    Neculai I. Fantanaru
                                    last edited by

                                    Yes, it’s about that {6}. In the new scenario I have 18/18 lines. So, it’s working if I change with {18}, such as:

                                    SEARCH: (?-s)^.+\R(?=(?:.+\R){18}(.+(\R)))|(?s)---.+
                                    REPLACE ?1\1$0\2

                                    But what If I have 1000/1110 lines (or if I don’t now the exact number of lines) ?

                                    I may try something {1,6} but is not good.

                                    Scott SumnerS 1 Reply Last reply Reply Quote 0
                                    • Scott SumnerS
                                      Scott Sumner @Neculai I. Fantanaru
                                      last edited by

                                      @Neculai-I.-Fantanaru

                                      But what If I have 1000/1110 lines (or if I don’t now the exact number of lines) ?

                                      It’s not that hard to move the caret to the line above the ----------- and note the line number there by either looking at the line number in the margin (hopefully you have that turned on) or on the status bar. Then move the caret and make sure that the number of lines below the ----------- matches. This seems rather basic so I feel dumb even typing it.

                                      But in case this is valuable, here’s an example, using 2 sets of 10 lines each:

                                      Line 1: start of first group of data
                                      Line 10: end of first group of data
                                      Line 11: line of -----------
                                      Line 12: start of second group of data
                                      Line 21: end of second group of data

                                      For the general case of “N” lines:

                                      Line 1: start of first group of data
                                      Line N: end of first group of data
                                      Line N+1: line of -----------
                                      Line N+2: start of second group of data
                                      Line 2N+1: end of second group of data

                                      1 Reply Last reply Reply Quote 0
                                      • Neculai I. FantanaruN
                                        Neculai I. Fantanaru
                                        last edited by

                                        the big problem, Scott, is that I have 500 files with more then 1000 lines, which I have to change. And I cannot move the caret to every line above the -----------, etc. I must change all at once with “SEARCH AND REPLACE ALL” function. And I don’t know exactly how many lines are in each files. So, It’s not just one file.

                                        Scott SumnerS 1 Reply Last reply Reply Quote 0
                                        • Scott SumnerS
                                          Scott Sumner @Neculai I. Fantanaru
                                          last edited by

                                          @Neculai-I.-Fantanaru

                                          So the premise from the beginning it seems has been that the need is for some few number of pairs of files; mentioning that you had something like 500 would probably have been a good thing to do early on. :-)

                                          Do you have a plan for how to combine the pairs of files into one file with the line of --------- separating them? That seems key to the solution described thus far. There are some not-so-difficult ways, but specifying WHICH files and making sure there are the same number of lines in each…I don’t know, but I hope you do, as this is your problem and your data. Perhaps the regular expression can be altered to avoid the dependency on “n”, but unless you have solutions for some of the other technical difficulties, well, then…

                                          It seems we are getting very close to advising you that you need a programming language to do this more easily.

                                          And yes, people, I do realize this discussion is getting a bit outside the realm of Notepad++ discussion. :-)

                                          1 Reply Last reply Reply Quote 2
                                          • First post
                                            Last post
                                          The Community of users of the Notepad++ text editor.
                                          Powered by NodeBB | Contributors