Append third row of text to a find replace?



  • I have 1k+ files that I need to find foo in and make it foo&3rdrow. Least complex way to make this happen without hand massaging each file? Thanks!



  • Hello @Matthew-suhajda,

    I’m trying to guess what you want, because a literal example would be welcomed !

    For instance, from the initial text, below :

    Line foo 1
    Line 2
    Line 3
    Line 4
    foo Line 5 
    Line 6
    Line 7 foo
    

    are you expecting the following text ?

    Line foo&Line3 1
    Line 2
    Line 3
    Line 4
    foo&Line3 Line 5 
    Line 6
    Line 7 foo&line3
    

    If so, I will be able to post a solution, next time, which does the job, with two consecutive regex search/replacements !

    See you later,

    Best Regards,

    guy038



  • All 1k files have a date and time stamp at line 3. I need to append that stamp to several places in each file before proceeding to clean up the rest of the data and combine all the files. So more like

    Line 1
    Line 2
    Line 3 1/1/2008 14:53:36
    Line 4 foo
    Line 5 dsafoj
    Line 6 adsaf foo
    Line 7
    Line 8 12341234 foo sdfsd

    I think your example has it correct, but hopefully the above illustrates the case a tad better.



  • Hi, @Matthew-suhajda,

    OK for the time stamp, in line 3 of your files, but you didn’t add what text you expect to, after replacement !

    I mean : Is the foo generic expression, located, in your example, in lines 4, 6 and 8, should be replaced with :

    • A : foo&1/1/2008 14:53:36 , with a literal & between ?

    • B : foo1/1/2008 14:53:36 , simply attached ?

    • C : foo 1/1/2008 14:53:36, with a space separator ?

    • D : Other Case ?

    Cheers,

    guy038



  • A space between would be perfect.



  • Note: Pasted as an image because of the stupid spam filter!! :-D

    Imgur
    Imgur



  • Hello, @Matthew-suhajda, and All,

    Well ! My idea consists in three steps :

    • Firstly, copy the time stamp 3rd line ( which I suppose to be different for each file !) at the very end of each file scanned, after a pure blank line ! this new line won’t be followed by any line-break

    • Secondly, search for any occurrence of the foo expression, with a look-ahead feature ( always true ) which stores the the very last line ( The time stamp line ) , added during the previous S/R step, as group 1

    • Thirdly, delete, in each file scanned, the very last line, temporarily added

    The first point is realized with a first regex S/R. The second and third ones are done, all together, by a second regex S/R

    Note that it’s necessary to copy the time stamp line at the very end, because, once the regex engine position is after Line 3, looking for some foo occurrences, it cannot remember that specific line, because it’s not part, anymore, of the later matches !


    So, let’s imagine the text below, with the time stamp in line 3 and the foo word, in lines 2, 6, 8 and 10 :

    This is a small
    example foo of
    1/1/2008 14:53:36
    text for testing
    the Matthew's goal !
    foo It doesn't
    mean anything
    and foo it's created
    to test the
    search/replacement foo
    That's the end.
    

    Then, the first regex S/R :

    SEARCH (?-s)^(?:.*\R){2}(.+)(?s).+

    REPLACE $0\r\n\1

    would give the following text, with a last line ( the 3rd ) added :

    This is a small
    example foo of
    1/1/2008 14:53:36
    text for testing
    the Matthew's goal !
    foo It doesn't
    mean anything
    and foo it's created
    to test the
    search/replacement foo
    That's the end.
    
    1/1/2008 14:53:36
    

    Now, the second regex S/R :

    SEARCH (?i)foo(?s)(?=.*\R(.+)\z)|(?-s)\R.+\z

    REPLACE ?1foo\x20\1

    give the expected text, below :

    This is a small
    example foo 1/1/2008 14:53:36 of
    1/1/2008 14:53:36
    text for testing
    the Matthew's goal !
    foo 1/1/2008 14:53:36 It doesn't
    mean anything
    and foo 1/1/2008 14:53:36 it's created
    to test the
    search/replacement foo 1/1/2008 14:53:36
    That's the end.
    

    I supposed that the search is insensitive to the case, so words FOO, Foo,… would match. If you prefer a sensitive search, just change the first regex part (?i) with the (?-i) syntax


    Practically, Matthew, follow these few steps :

    • First, BACKUP all the files, concerned with these S/R ( IMPORTANT )

    • Start Notepad ++ and open the Find in Files dialog

    • Type in (?-s)^(?:.*\R){2}(.+)(?s).+ , in the Find what: zone

    • Type in $0\r\n\1 , in the Replace with: zone

    • Enter the right extension of your files ( for instance *.txt, *.html, … ), in the Filters : zone

    • Add the full pathname of the folder, containing all your files, in the Directory : zone

    • Select the Regular expression search mode ( IMPORTANT )

    • Click on the Replace in Files button

    • Click on the OK button, of the confirmation dialog

    At that time, all the files scanned should have a new line, at their end, identical to their 3rd line ! Now :

    • Change the Find what: zone with the regex (?i)foo(?s)(?=.*\R(.+)\z)|(?-s)\R.+\z

    • Change the Replace with: zone with the regex ?1foo\x20\1

    • Click, again, on the Replace in Files button

    • Click on the OK button, of the confirmation dialog

    Et voilà ! any occurrence of foo, in each scanned file, should be followed, after a space separator, with the appropriate time stamp of each file ;-))

    Best Regards,

    guy038

    P.S. :

    If you want to, I’ll give you, next time, some explanations about these regexes !!



  • @guy038 said:

    ?1foo\x20\1

    Exquisite. Worked perfectly. Hopefully this is the only time I’ll need to do something like this, but I will totally ask for more direction in the future if it comes up again. There’s always a new puzzle when dealing with shitty data! lol

    Thank you so very much.



  • Hello, @Matthew-suhajda, and All,

    Pleased to hear that it worked fine ! Just for information :

    Regarding the first S/R :

    SEARCH (?-s)^(?:.*\R){2}(.+)(?s).+

    REPLACE $0\r\n\1

    • The modifier (?-s) means that, further dots will match any single character, only

    • Then, the part ^(?:.*\R){2} looks for the first two lines, with their EOL chars, in a non capturing group

    • Now, the part (.+) stores, as group 1, the next 3rd line, without its End of Line characters

    • Finally, the (?s).+ syntax catches all remaining text from End of Line characters of line 3

    • In replacement, due to the $0 syntax, it re-writes, first, the entire matched text ( = file contents ), followed with a Windows line break ( \r\n ) and, finally, with the group 1 ( = The 3rd line = time stamp )


    Regarding the second S/R :

    SEARCH (?i)foo(?s)(?=.*\R(.+)\z)|(?-s)\R.+\z

    REPLACE ?1foo\x20\1

    • The searched regex is made of two alternatives, separated by the alternation special character | :

      • (?i)foo(?s)(?=.*\R(.+)\z)

      • (?-s)\R.+\z

    • In the first alternative, the part (?i)foo tries, first, to match the foo word, in any case

    • Then, the (?s)(?=.*\R(.+)\z) syntax represents an always true look-ahead, (?=......), which matches all text after the foo word, till the second to the last line ( .*\R ), and the last ( or 3rd ) line, without any line-break ( (.+)\z ), which is stored as group 1

    • Near the end of each file, the second alternative, (?-s)\R.+\z, looks for the very last ( or 3rd ) line contents, till the very end of each file ( \z )

    • In replacement, the ?1foo\x20\1 syntax means :

      • If group 1 exists, it rewrites the entire matched string foo, followed with a space character and the time stamp ( last ) line ( \1 )

      • If group 1 does not exist ( case of the second alternative ), the very last line, temporarily added, is then, simply, deleted, as no ELSE part is present in the conditional replacement ?1..... !

    Best Regards,

    guy038


Log in to reply