Find and Replace: Multiple Replacements in Part of a String



  • Hi,

    I’m using the build-in find and replace tool (CTRL+H) with case sensitivity turned on and regular expressions. I’m limited to using vanilla Notepad++ (no plugins, no python etc.).

    I have a file with sets of ID’s and strings separated by a comma structured like so:

    ID, String
    12345-01, A+A*2B+B*+A
    12345-02, A+AB+B*+AA

    I want to make the following replacements but only in the string (after the comma).
    +A* --> 1
    +A --> a
    +B* --> 2
    +B --> b
    2 --> X

    As example, “12345-01, A+A*2B+B*+A” should be changed to “12345-01, A1XB2a”.

    Now if the ID was not there the following works like a charm:
    Search for: (\+A\*)|(\+A)|(\+B\*)|(\+B)|(2)
    Replace with: (?{1}1)(?{2}a)(?{3}2)(?{4}b)(?{5}X)

    However, when the ID is present I cannot seem to find a solution that will leave the ID unchanged while making all the replacements in the string.

    Do you have any suggestions?



  • @Anos said in Find and Replace: Multiple Replacements in Part of a String:

    that will leave the ID unchanged

    A quick “fix” might be to use:
    (\+A\*)|(\+A)|(\+B\*)|(\+B)|(\b2\b)
    So here the 2 must be at a "boundary. For the 2 example lines provided it does work, however 2 examples does NOT a book make! It will depend on whether the “2” in the rest of the expression is surrounded by different characters on both sides.

    Terry



  • @Terry-R said in Find and Replace: Multiple Replacements in Part of a String:

    So here the 2 must be at a "boundary

    Sorry, jumped the gun slightly, it did work on 2nd example, missed that it didn’t work on the first examples. Yes it IS a bit of a poser. It will involve a bit more thought.

    Terry



  • @Terry-R said in Find and Replace: Multiple Replacements in Part of a String:

    It will involve a bit more thought.

    Sorry, about that false start, I think I now have it. We have
    FW:(?-s)((\+A\*)|(\+A)|(\+B\*)|(\+B)|(2))(?!.*?,)
    RW:(?{2}1)(?{3}a)(?{4}2)(?{5}b)(?{6}X)

    So as I had to add a negative lookahead the bracket numbering all changed hence a new replace with code as well.
    So basically whenever it finds a character, so long as no , after it on the line it will be changed. As the ID is before the , nothing there should be changed.

    Terry

    PS should have paid more attention to your statement
    I want to make the following replacements but only in the string (after the comma).



  • @Anos

    I came up with this; seems to work but maybe has holes:

    find: (^[^,]+,)|(\+A\*)|(\+A)|(\+B\*)|(\+B)|(2)
    repl: (?{1}\1)(?{2}1)(?{3}a)(?{4}2)(?{5}b)(?{6}X)

    The result of the replacement with it:

    12345-01, A1XB2a
    12345-02, AaB2aA
    


  • @Terry-R said in Find and Replace: Multiple Replacements in Part of a String:

    So as I had to add a negative lookahead the bracket numbering all changed

    You could have made the wrapping parentheses a non-capturing group: (?-s)(?:(\+A\*)|(\+A)|(\+B\*)|(\+B)|(2))(?!.*?,), to avoid the renumbering in the replacement.

    TIMTOWTDI



  • @Alan-Kilborn said in Find and Replace: Multiple Replacements in Part of a String:

    find: (^[^,]+,)|(+A*)|(+A)|(+B*)|(+B)|(2)
    repl: (?{1}\1)(?{2}1)(?{3}a)(?{4}2)(?{5}b)(?{6}X)

    I vote for yours. As an interesting aside, using regex101.com and inputting the 2 example lines and the Find What code, my code took twice as long as @Alan-Kilborn to process. It’s obvious the lookahead is where the extra time is spent.

    For a small file to process it may not mean a lot, but sometimes efficiency in coding can be an advantage, hence my vote for @Alan-Kilborn code.

    Terry



  • @Terry-R This does seem to work as intended, at least with my limited testing. Thank you very much for your quick replies. I have never really familiarized myself with lookaheads, they certainly look useful though.



  • @Anos said in Find and Replace: Multiple Replacements in Part of a String:

    I have never really familiarized myself with lookaheads

    There are LOTS of wonderful things to try and remember, as @PeterJones just reminded me. I should have made that a non-capture group, then it would not have required a rejig of the replace with code.

    As I always say
    “The day you stop learning is the day you die”

    Terry



  • @Alan-Kilborn Thank you for this solution. This also gets the job done, and as @Terry-R points out it seems to be more efficient.



  • Hello, @anos, @terry-r, @alan-kilborn, @peterjones and All,

    And here is my solution !

    If we use the FREE-SPACING mode (?x), for the SEARCH part :
    
    SEARCH  (?x-s)  (?: ( \+A (\*)? ) | ( \+B (\*)? ) | (2) )  (?!.*,)
    Groups -->      No  1     2         3     4         5     Look-Ahead  
    
    REPLACE (?1(?{2}1:a))(?3(?{4}2:b))?5X
    
    BEWARE that, in the REPLACE part, the FREE-SPACING mode is FORBIDDEN. So, ONLY for INFO :
    
    REPLACE ( ?1 ( ?{2} 1 : a ) )  ( ?3 ( ?{4} 2 : b ) ) ?5 X
    

    and given the data :

    12345-01, A+A*2B+B*+A
    12345-02, A+AB+B*+AA
    

    it would return :

    12345-01, A1XB2a
    12345-02, AaB2aA
    

    Notes :

    • The first part (?x-s) of the regex search means that :

      • The free-spacing mode is set ( Spaces are not taken in account, except for the [ ] syntax or an escaped space char )

      • Due to (?-s) syntax, the dot regex symbol matches a single standard char only ( not an EOL char )

    • Then, the (?:......) syntax defines a non-capturing group

    • Now, in this non-capturing group, we have 3 alternatives and the first two contain an optional inner group (\*)? ( Remember that the ? is an other form of the {0,1} quantifier )

    • To end, all this regex , so far, will match ONLY IF the final negative look-ahead structure (?!.*,) is verified, that is to say if at current position, reached by the regex engine, there is never a comma, at any further position, in current line

    • Now, in the replacement regex :

      • The (?1(?{2}1:a)) syntax means that if  group 1 exists, then if  group 2 exists, then  write 1 else  write a

      • The (?3(?{4}2:b)) syntax means that if  group 3 exists, then if  group 4 exists, then  write 2 else  write b

      • Finally, the ?5X means that if  group 5 exists, then  write an X ( The parentheses are not mandatory as this part ends the regex

      • Note also that it’s not necessary to surround the groups 1, 3 and 5 with braces as these groups are not immediately followed with a digit !

    Best Regards,

    guy038


Log in to reply