Community
    • Login

    Find and Replace: Multiple Replacements in Part of a String

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    11 Posts 5 Posters 738 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • ?
      A Former User
      last edited by

      Hi,

      I’m using the build-in find and replace tool (CTRL+H) with case sensitivity turned on and regular expressions. I’m limited to using vanilla Notepad++ (no plugins, no python etc.).

      I have a file with sets of ID’s and strings separated by a comma structured like so:

      ID, String
      12345-01, A+A*2B+B*+A
      12345-02, A+AB+B*+AA

      I want to make the following replacements but only in the string (after the comma).
      +A* --> 1
      +A --> a
      +B* --> 2
      +B --> b
      2 --> X

      As example, “12345-01, A+A*2B+B*+A” should be changed to “12345-01, A1XB2a”.

      Now if the ID was not there the following works like a charm:
      Search for: (\+A\*)|(\+A)|(\+B\*)|(\+B)|(2)
      Replace with: (?{1}1)(?{2}a)(?{3}2)(?{4}b)(?{5}X)

      However, when the ID is present I cannot seem to find a solution that will leave the ID unchanged while making all the replacements in the string.

      Do you have any suggestions?

      Terry RT 1 Reply Last reply Reply Quote 1
      • Terry RT
        Terry R @A Former User
        last edited by

        @Anos said in Find and Replace: Multiple Replacements in Part of a String:

        that will leave the ID unchanged

        A quick “fix” might be to use:
        (\+A\*)|(\+A)|(\+B\*)|(\+B)|(\b2\b)
        So here the 2 must be at a "boundary. For the 2 example lines provided it does work, however 2 examples does NOT a book make! It will depend on whether the “2” in the rest of the expression is surrounded by different characters on both sides.

        Terry

        Terry RT 1 Reply Last reply Reply Quote 1
        • Terry RT
          Terry R @Terry R
          last edited by

          @Terry-R said in Find and Replace: Multiple Replacements in Part of a String:

          So here the 2 must be at a "boundary

          Sorry, jumped the gun slightly, it did work on 2nd example, missed that it didn’t work on the first examples. Yes it IS a bit of a poser. It will involve a bit more thought.

          Terry

          1 Reply Last reply Reply Quote 0
          • Terry RT
            Terry R
            last edited by Terry R

            @Terry-R said in Find and Replace: Multiple Replacements in Part of a String:

            It will involve a bit more thought.

            Sorry, about that false start, I think I now have it. We have
            FW:(?-s)((\+A\*)|(\+A)|(\+B\*)|(\+B)|(2))(?!.*?,)
            RW:(?{2}1)(?{3}a)(?{4}2)(?{5}b)(?{6}X)

            So as I had to add a negative lookahead the bracket numbering all changed hence a new replace with code as well.
            So basically whenever it finds a character, so long as no , after it on the line it will be changed. As the ID is before the , nothing there should be changed.

            Terry

            PS should have paid more attention to your statement
            I want to make the following replacements but only in the string (after the comma).

            PeterJonesP ? 2 Replies Last reply Reply Quote 1
            • Alan KilbornA
              Alan Kilborn
              last edited by

              @Anos

              I came up with this; seems to work but maybe has holes:

              find: (^[^,]+,)|(\+A\*)|(\+A)|(\+B\*)|(\+B)|(2)
              repl: (?{1}\1)(?{2}1)(?{3}a)(?{4}2)(?{5}b)(?{6}X)

              The result of the replacement with it:

              12345-01, A1XB2a
              12345-02, AaB2aA
              
              Terry RT ? 2 Replies Last reply Reply Quote 2
              • PeterJonesP
                PeterJones @Terry R
                last edited by

                @Terry-R said in Find and Replace: Multiple Replacements in Part of a String:

                So as I had to add a negative lookahead the bracket numbering all changed

                You could have made the wrapping parentheses a non-capturing group: (?-s)(?:(\+A\*)|(\+A)|(\+B\*)|(\+B)|(2))(?!.*?,), to avoid the renumbering in the replacement.

                TIMTOWTDI

                1 Reply Last reply Reply Quote 1
                • Terry RT
                  Terry R @Alan Kilborn
                  last edited by

                  @Alan-Kilborn said in Find and Replace: Multiple Replacements in Part of a String:

                  find: (^[^,]+,)|(+A*)|(+A)|(+B*)|(+B)|(2)
                  repl: (?{1}\1)(?{2}1)(?{3}a)(?{4}2)(?{5}b)(?{6}X)

                  I vote for yours. As an interesting aside, using regex101.com and inputting the 2 example lines and the Find What code, my code took twice as long as @Alan-Kilborn to process. It’s obvious the lookahead is where the extra time is spent.

                  For a small file to process it may not mean a lot, but sometimes efficiency in coding can be an advantage, hence my vote for @Alan-Kilborn code.

                  Terry

                  1 Reply Last reply Reply Quote 2
                  • ?
                    A Former User @Terry R
                    last edited by

                    @Terry-R This does seem to work as intended, at least with my limited testing. Thank you very much for your quick replies. I have never really familiarized myself with lookaheads, they certainly look useful though.

                    1 Reply Last reply Reply Quote 2
                    • Terry RT
                      Terry R
                      last edited by

                      @Anos said in Find and Replace: Multiple Replacements in Part of a String:

                      I have never really familiarized myself with lookaheads

                      There are LOTS of wonderful things to try and remember, as @PeterJones just reminded me. I should have made that a non-capture group, then it would not have required a rejig of the replace with code.

                      As I always say
                      “The day you stop learning is the day you die”

                      Terry

                      1 Reply Last reply Reply Quote 1
                      • ?
                        A Former User @Alan Kilborn
                        last edited by

                        @Alan-Kilborn Thank you for this solution. This also gets the job done, and as @Terry-R points out it seems to be more efficient.

                        1 Reply Last reply Reply Quote 2
                        • guy038G
                          guy038
                          last edited by guy038

                          Hello, @anos, @terry-r, @alan-kilborn, @peterjones and All,

                          And here is my solution !

                          If we use the FREE-SPACING mode (?x), for the SEARCH part :
                          
                          SEARCH  (?x-s)  (?: ( \+A (\*)? ) | ( \+B (\*)? ) | (2) )  (?!.*,)
                          Groups -->      No  1     2         3     4         5     Look-Ahead  
                          
                          REPLACE (?1(?{2}1:a))(?3(?{4}2:b))?5X
                          
                          BEWARE that, in the REPLACE part, the FREE-SPACING mode is FORBIDDEN. So, ONLY for INFO :
                          
                          REPLACE ( ?1 ( ?{2} 1 : a ) )  ( ?3 ( ?{4} 2 : b ) ) ?5 X
                          

                          and given the data :

                          12345-01, A+A*2B+B*+A
                          12345-02, A+AB+B*+AA
                          

                          it would return :

                          12345-01, A1XB2a
                          12345-02, AaB2aA
                          

                          Notes :

                          • The first part (?x-s) of the regex search means that :

                            • The free-spacing mode is set ( Spaces are not taken in account, except for the [ ] syntax or an escaped space char )

                            • Due to (?-s) syntax, the dot regex symbol matches a single standard char only ( not an EOL char )

                          • Then, the (?:......) syntax defines a non-capturing group

                          • Now, in this non-capturing group, we have 3 alternatives and the first two contain an optional inner group (\*)? ( Remember that the ? is an other form of the {0,1} quantifier )

                          • To end, all this regex , so far, will match ONLY IF the final negative look-ahead structure (?!.*,) is verified, that is to say if at current position, reached by the regex engine, there is never a comma, at any further position, in current line

                          • Now, in the replacement regex :

                            • The (?1(?{2}1:a)) syntax means that if  group 1 exists, then if  group 2 exists, then  write 1 else  write a

                            • The (?3(?{4}2:b)) syntax means that if  group 3 exists, then if  group 4 exists, then  write 2 else  write b

                            • Finally, the ?5X means that if  group 5 exists, then  write an X ( The parentheses are not mandatory as this part ends the regex

                            • Note also that it’s not necessary to surround the groups 1, 3 and 5 with braces as these groups are not immediately followed with a digit !

                          Best Regards,

                          guy038

                          1 Reply Last reply Reply Quote 1
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors