Search and replace between 2 files



  • Dear experts,

    I have two different files and I have to search and replace strings in this way:

    in the file A, I have to check if in the row there is the string [NO_CODE].
    Then, in the same row, I have to select the text between ; and [
    (for example the selection will be 5384578)
    The text may be variable lenght

    Sample file A
    XXX_020_G30;00;M9990940382;00;3;
    XXX_020_G30;00;5384578 [NO_CODE];00;3;
    XXX_020_G30;00;1214_020_341;00;2;
    XXX_020_G30;00;M9990940381;00;8;
    XXX_020_G30;00;1214_048_G37;00;1;

    In the file B, I have to search the string 5384578. Then in the row found, I have to select the string before ; copy and substitute 5384578 [NO_CODE] in file A. If, in the file B there is not the matching string, return to file A and skip to the next row.

    Sample file B
    M0100940001;6012602;
    M0100940002;6012606;
    M0100940003;6012605;
    M0220580002;5384578;
    M0220580004;5940029;
    M0220580005;5940030;
    M0220580007;5940111;
    M0220780001;5952013;

    Final file will be:

    XXX_020_G30;00;M9990940382;00;3;
    XXX_020_G30;00;M0220580002;00;3;
    XXX_020_G30;00;1214_020_341;00;2;
    XXX_020_G30;00;M9990940381;00;8;
    XXX_020_G30;00;1214_048_G37;00;1;

    Anyone can help me?



  • @Enzo-Turatti said in Search and replace between 2 files:

    Dear experts,
    I have two different files and I have to search and replace strings in this way

    I believe it is possible, however it will require a few steps. In broad terms:

    1. Bring the “key” to the front of the lines in both files. The two files could then be combined and sorted which would put any possible pairs together.
    2. With each pair adjust the line from file a changing the character sequence with what the secondary line has.
    3. Remove all of the lines from file b and remove the key from the front of the line.

    Sounds relatively straight forward. It only remains to be seen if in reality it works that easily.
    Question is the line order in file a important as that can be catered for, but it does complicate it some more.
    Terry



  • Hello @enzo-turatti and All,

    Here is a regex solution which works ONLY IF the number of lines of file A and B are not too important !

    • First copy the contents of file A in a file C

    • Add a dummy line of, at least, three = signs

    • Append the contents of file B under the line ======

    => So, the contents of file C should be :

    XXX_020_G30;00;M9990940382;00;3;
    XXX_020_G30;00;5384578 [NO_CODE];00;3;
    XXX_020_G30;00;1214_020_341;00;2;
    XXX_020_G30;00;M9990940381;00;8;
    XXX_020_G30;00;1214_048_G37;00;1;
    ======
    M0100940001;6012602;
    M0100940002;6012606;
    M0100940003;6012605;
    M0220580002;5384578;
    M0220580004;5940029;
    M0220580005;5940030;
    M0220580007;5940111;
    M0220780001;5952013;
    
    • Run the following regex S/R, against file C :

      • SEARCH (?s-i)([^;]+) [NO_CODE](?=.+===.+^(.+?);\1;)|^===.+

      • REPLACE \2

    You should obtain your expected text :

    XXX_020_G30;00;M9990940382;00;3;
    XXX_020_G30;00;M0220580002;00;3;
    XXX_020_G30;00;1214_020_341;00;2;
    XXX_020_G30;00;M9990940381;00;8;
    XXX_020_G30;00;1214_048_G37;00;1;
    

    As said above, the behaviour of this regex highly depends on the size of files ! Il may fail and even end with a null file C !

    In this case, we’ll have to find out an other way, which probably will use sorting !

    See you later,

    Best regards,

    guy038



  • @guy038 said in Search and replace between 2 files:

    As said above, the behaviour of this regex highly depends on the size of files ! Il may fail and even end with a null file C !

    This failure comes up a lot in such solutions.

    @guy038 Have you ever switched to the following (as an experiment) when such a failure has occurred?:

    In Pythonscript console, do the following:

    editor.rereplace(r' search_expr ', r' replace_expr ')

    Specifically for the case above, it would be:

    editor.rereplace(r'(?s-i)([^;]+) [NO_CODE](?=.+===.+^(.+?);\1;)|^===.+', r'\2')

    I’m just wondering if this technique would work when the native N++ technique fails.



  • Hi, @enzo-turatti, @alan-kilborn and All,

    Alan, I’ve just changed your line :

    editor.rereplace(r'(?s-i)([^;]+)\x20[NO_CODE](?=.+===.+^(.+?);\1;)|^===.+', r'\2')~~~
    

    as

    editor.rereplace(r'(?s-i)([^;]+)\x20\[NO_CODE\](?=.+===.+^(.+?);\1;)|^===.+', r'\2')
    

    n order that the regex works !! Seemingly it’s a problem with the NoddBB’s forum ! In order to show \[ you must write \\[


    For test, I used this example :

    XXX_020_G30;00;M9990940382;00;3;
    XXX_020_G30;00;5384578 [NO_CODE];00;3;    // Line to be MODIFIED
    XXX_020_G30;00;1214_020_341;00;2;
    XXX_020_G30;00;M9990940381;00;8;
    XXX_020_G30;00;1214_048_G37;00;1;
    .....
    XXX_020_G30;00;M9990940382;00;3;    \
    XXX_020_G30;00;1214_020_341;00;2;   |     // repeated 66,433 times 
    XXX_020_G30;00;M9990940381;00;8;    |
    XXX_020_G30;00;1214_048_G37;00;1;   /
    .....
    XXX_020_G30;00;M9990940382;00;3;
    XXX_020_G30;00;1214_020_341;00;2;         // ONCE more
    XXX_020_G30;00;M9990940381;00;8;
    XXX_020_G30;00;1214_048_G37;00;1;
    ======
    M0100940001;6012602;
    M0100940002;6012606;
    M0100940003;6012605;
    M0220580002;5384578;
    M0220580004;5940029;
    M0220580005;5940030;
    M0220580007;5940111;
    M0220780001;5952013;
    

    This give a total of 5 + 4 x 66434 + 9 = 265,750 lines !

    • Running, either the regex from within N++ or the one-line Python script, above, it’s OK for both but the script is faxter ( 13s instead of 20s for the regex, on my old Windows XP SP3 machine ! )

    • Unfortunately, when adding 48 lines only ( 12 blocks of four lines ) for a total of 265,798 lines :

      • The regex fails and deletes all the contents, leaving two lines only !

      • The python script also fails to do the replacement and reports this error :

    Traceback (most recent call last):
      File "D:\@@\785\plugins\Config\PythonScript\scripts\Test_Alan.py", line 2, in <module>
        editor.rereplace(r'(?s-i)([^;]+)\x20\[NO_CODE\](?=.+===.+^(.+?);\1;)|^===.+', r'\2')
    RuntimeError: The complexity of matching the regular expression exceeded predefined bounds.  Try refactoring the regular expression to make each choice made by the state machine unambiguous.  This exception is thrown to prevent "eternal" matches that take an indefinite period time to locate.
    

    Remarks :

    • The Python’s script behaviour is nicer because, in case of problem, it does not change the file at all and just outputs an error :-))

    • The Python script seems significantly faster than the native N++ regex engine !

    • Finally, as the lines of the @enzo-turatti’s example are rather small, both the regex S/R or the equivalent Python script seem to work with a 9 Mb file ! However, I just created a case with one concordance, only ! As always, working with real data will put an end to our questions !

    Cheers,

    guy038



  • Thank you very much.
    This week I should have time to test your suggestions.

    Best regards!


  • Banned

    Hello,@Enzo-Turatti
    Please follow this Step,To Search and replace between 2 files

    The Find in Files tab (Search > Find in Files or Ctrl+Shift+F) allows you to search and replace in multiple files with one action.

    Step 1: press Ctrl+F and then from the Find in Files tab options.
    Step 2: Put the string in the regex format of the Find What:
    ^. (PeopleSleptWith).*$*
    The string will go between the “()” parenthesis just as shown above in #1
    Step 3: Put the 5 spaces and then the Replace with: PeopleSleptWith 7 string
    Step 4: Put the Filters: as . or *.txt or whatever you are replacing file type wise
    Step 5: Put the Directory: where you want it to be
    Step 6: Check the Regular expression option
    Step 7: Select Replace in Files then Check the file(s) and all should be correct now

    I hope this information will be useful.
    Thank you.



  • @Makwana-Prahlad said in Search and replace between 2 files:

    PeopleSleptWith

    ???

    I think it is time for “banning” this individual.

    Grounds: He pollutes threads with posts with useless and unrelated information which will only make any future readers seeking a solution to the problem originally posed confused.

    Need more evidence: Look at his previous postings.


Log in to reply