For regex users: how can I add some lines after some other lines (intersect them)



  • Hi, @neculai-i.-fantanaru, @terry-r and All,

    You’re quite right, Terry, about that limit. Indeed if we exceed this limit, the regex engine, wrongly, matches all the file contents !

    I did some tests, copying many blocks of text, as below, in order to get a huge block :

    My mother is home.
    My father is with my sister.
    I have to go home.
    I need some help.
    God is everywhere.
    My dog is with her cat.
    

    And it happens that, with my configuration, the limit is 47,146 consecutive lines in a block ( 1,052,927 bytes ):

    My mother is home.                 )
    My father is with my sister.       )
    I have to go home.                 )
    I need some help.                  )  = 7857 blocks of 6 lines = 47,142 lines    ( 1,052,838 bytes )
    God is everywhere.                 )
    My dog is with her cat.            )
    My mother is home.                 
    My father is with my sister.                                   +      4 lines    (        89 bytes )
    I have to go home.
    I need some help.                                               -------------    -------------------
                                                                     47,146 lines      1,052,927 bytes
    

    Why ?! May be it varies with the length of lines or with the total amount of bytes of a block ? Didn’t dig it out more ;-))

    Cheers,

    guy038



  • @guy038

    Thanks for doing that quick test. interesting that the limit (in this case) was vastly different from the one in the link I mentioned. it does suggest that we cannot rely on a specific number of lines, above which the lookahead fails.

    It would seem there is some unknown interaction between complexity of the regex and perhaps the number of characters being worked through. Regardless, not being able to say (with certainty) the issue occurs with “x” number of lines or “y” characters does somewhat diminish the lookahead function as a useful tool.

    Cheers
    Terry



  • My opinion is that there isn’t some magical # of lines that can’t be exceeded before this “select all” issue rears its ugly head. It is going to depend upon the regex used and the data it is used upon. All regex engines are going to have their limitations (stack depth, memory buffer sizes, and such things…), and Notepad++ 's is no different. It just so happens is that with Notepad++, instead of the user receiving an appropriate indication that a catastrophic limit has been reached, it just, well…selects all the text to be the “match” and presents that to the user. Ugh.

    I think our old friend @Claudia-Frank spoke well to this here, which I may have cited before in other threads. Claudia gives a rationale for why the “match” found “starts” at the beginning of file; I’ll wager there is a correspondingly similar reason why the “match” “ends” at the end of file (although I haven’t investigated that–but I may be doing just that…).



  • @guy038 said:

    SEARCH (?-s)^.+\R(?=(?:.+\R){6}(.+(\R)))|(?s)---.+
    REPLACE ?1\1$0\2

    Your regex is GREAT. Buy there is a little problem, @guy038. See this example:

    My mother is home.
    My father is with my sister.
    I have to go home.
    I need some help.
    God is everywhere.
    My dog is with her cat.
    -----------
    https://mywebsite.com/my-link-one.html
    https://mywebsite.com/my-link-two.html
    https://mywebsite.com/my-link-three.html
    https://mywebsite.com/my-link-four.html
    https://mywebsite.com/my-link-five.html
    https://mywebsite.com/my-link-six.html
    

    After search and replace, (test again regex) I cannot find the last line https://mywebsite.com/my-link-five.html



  • @Neculai-I.-Fantanaru

    I cannot find the last line https://mywebsite.com/my-link-five.html

    DId you really mean to say?: “I cannot find the last line https://mywebsite.com/my-link-six.html

    If so, I can understand why you cannot find it: In your before-text that line (at end-of-file) probably doesn’t have a line-ending on it. Move to the end of that line and press Enter. Then try @guy038’s replacement operation again.



  • @guy38 said:

    SEARCH (?-s)^.+\R(?=(?:.+\R){6}(.+(\R)))|(?s)---.+
    REPLACE ?1\1$0\2

    ok, I try again, but another scenario. Suppose there are much more lines. Try this, and you will see the problem.

    My mother is home.
    My father is with my sister.
    I have to go home.
    I need some help.
    God is everywhere.
    My dog is with her cat.
    My mother is home.
    My father is with my sister.
    I have to go home.
    I need some help.
    God is everywhere.
    My dog is with her cat.
    My mother is home.
    My father is with my sister.
    I have to go home.
    I need some help.
    God is everywhere.
    My dog is with her cat.
    -----------
    https://mywebsite.com/my-link-one.html
    https://mywebsite.com/my-link-two.html
    https://mywebsite.com/my-link-three.html
    https://mywebsite.com/my-link-four.html
    https://mywebsite.com/my-link-five.html
    https://mywebsite.com/my-link-six.html
    https://mywebsite.com/my-link-one.html
    https://mywebsite.com/my-link-two.html
    https://mywebsite.com/my-link-three.html
    https://mywebsite.com/my-link-four.html
    https://mywebsite.com/my-link-five.html
    https://mywebsite.com/my-link-six.html
    https://mywebsite.com/my-link-one.html
    https://mywebsite.com/my-link-two.html
    https://mywebsite.com/my-link-three.html
    https://mywebsite.com/my-link-four.html
    https://mywebsite.com/my-link-five.html
    https://mywebsite.com/my-link-six.html


  • @Neculai-I.-Fantanaru

    So the expectation here, when people provide help is that you read, understand, and learn from what is provided. Sure, if this is a one-time need, I suppose you can just use it blindly and move on, without understanding or trying to learn its application. But this isn’t the case for you, as you are now trying to use the solution provided for a different set of data, without adjusting appropriately the provided solution.

    Big hint: Go back and read the part where @guy038 says:

    As your text contains two blocks of six lines => n = 6 So, the **correct **regex, in your case…

    So see if you can adjust the provided solution to be correct for your new problem.



  • yes, indeed. But I thought to update my case with a more complex scenario. In the event that some will want something else.



  • @Neculai-I.-Fantanaru

    So you updated the scenario–fine…can you also present a corrected expression to match that scenario…as a demonstration of what you’ve learned from the help you’ve received?



  • Yes, it’s about that {6}. In the new scenario I have 18/18 lines. So, it’s working if I change with {18}, such as:

    SEARCH: (?-s)^.+\R(?=(?:.+\R){18}(.+(\R)))|(?s)---.+
    REPLACE ?1\1$0\2

    But what If I have 1000/1110 lines (or if I don’t now the exact number of lines) ?

    I may try something {1,6} but is not good.



  • @Neculai-I.-Fantanaru

    But what If I have 1000/1110 lines (or if I don’t now the exact number of lines) ?

    It’s not that hard to move the caret to the line above the ----------- and note the line number there by either looking at the line number in the margin (hopefully you have that turned on) or on the status bar. Then move the caret and make sure that the number of lines below the ----------- matches. This seems rather basic so I feel dumb even typing it.

    But in case this is valuable, here’s an example, using 2 sets of 10 lines each:

    Line 1: start of first group of data
    Line 10: end of first group of data
    Line 11: line of -----------
    Line 12: start of second group of data
    Line 21: end of second group of data

    For the general case of “N” lines:

    Line 1: start of first group of data
    Line N: end of first group of data
    Line N+1: line of -----------
    Line N+2: start of second group of data
    Line 2N+1: end of second group of data



  • the big problem, Scott, is that I have 500 files with more then 1000 lines, which I have to change. And I cannot move the caret to every line above the -----------, etc. I must change all at once with “SEARCH AND REPLACE ALL” function. And I don’t know exactly how many lines are in each files. So, It’s not just one file.



  • @Neculai-I.-Fantanaru

    So the premise from the beginning it seems has been that the need is for some few number of pairs of files; mentioning that you had something like 500 would probably have been a good thing to do early on. :-)

    Do you have a plan for how to combine the pairs of files into one file with the line of --------- separating them? That seems key to the solution described thus far. There are some not-so-difficult ways, but specifying WHICH files and making sure there are the same number of lines in each…I don’t know, but I hope you do, as this is your problem and your data. Perhaps the regular expression can be altered to avoid the dependency on “n”, but unless you have solutions for some of the other technical difficulties, well, then…

    It seems we are getting very close to advising you that you need a programming language to do this more easily.

    And yes, people, I do realize this discussion is getting a bit outside the realm of Notepad++ discussion. :-)


Log in to reply