• Login
Community
  • Login

Replacing a digit with exception

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
21 Posts 4 Posters 1.8k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J
    Jean-Francois Trehard
    last edited by Jan 31, 2021, 1:15 AM

    Hello everyone,
    I’m totally new here and also new to Notepad++ advanced usage.
    For many hours I tried to find a solution to my problem, without any clue and I hope there is a solution.

    I have a file with almost 9000 lines and I’d like to replace numbers with an exception.
    Here is an example of what the file looks like :

    dc.l $02220620 ; Tile #42
    dc.l $01012021
    dc.l $55111003 ; Tile #956
    dc.l $55001111
    dc.l $54000116 ; Tile #1628
    dc.l $00022222

    Here, I want to replace all the digits “2”, except the ones that are after the sharp “#”.
    With my little experience, I didn’t go far, even trying to find a solution on the net.
    Is there a way to search and replace only in a specified string ? Like telling I want within the 8 characters string after the “$” or in everything befor the “;” ?
    Or could it be better just to search the “2” and make a replace with an exception (if we can) ?

    For example I tried this to avoid selecting after the “#” :
    (2)(.{1,8})
    Then replace with :
    \Q\2
    But it’s only replacing the first “2” and going to the next line…

    Any help would be welcomed :)
    Thank you

    1 Reply Last reply Reply Quote 0
    • T
      Terry R
      last edited by Jan 31, 2021, 2:03 AM

      @Jean-Francois-Trehard said in Replacing a digit with exception:

      Here, I want to replace all the digits “2”, except the ones that are after the sharp “#”.

      I wasn’t sure what you intended replacing the 2 with so I included the Q character, I know it’s obviously not what you want, so replace that character with what you need to in the “Replace With” field, or if nothing then just remove the Q character.

      So in essence we are looking for a ; followed by some more characters until the end of the line. This search occurs at every character encountered, so if the current character(s) do not match then the regex uses the ‘alternate’ option, find a 2. This allows it to move across the line character by character. Once it encounters the ;, it grabs the remainder of the line and returns that. When it finds the 2, it currently replaces with a Q.
      So this means it can never encounter a 2 past the ; in the line as the first portion of regex will be true and that takes precedence.

      Using the “Replace” function we have
      Find What:(?-s)(;.*\R)|(2)
      Replace With:(?1\1)(?2Q)

      Hopefully this solves your issue and the description I provided will help you understand the logic behind it.

      Terry

      1 Reply Last reply Reply Quote 2
      • T
        Terry R
        last edited by Jan 31, 2021, 4:49 AM

        @Terry-R said in Replacing a digit with exception:

        Find What:(?-s)(;.*\R)|(2)

        A minor refinement due to possibility of last line having the ; Tile # xxx string. Currently my regex would replace any 2 in the Tile number. So replacement Find What is (?-s)(;.*$)|(2). Note I replaced the \R with the $ only.

        Terry

        1 Reply Last reply Reply Quote 2
        • G
          guy038
          last edited by guy038 Jan 31, 2021, 12:21 PM Jan 31, 2021, 12:19 PM

          Hello, @jean-francois-trehard, @terry-R and All,

          If your need is to search for any 2 digit, in the first range of consecutive digits, after the $ symbol, and replace it with the string Q2, a possible regex S/R would be :

          SEARCH (\$|\G)\d*?\K2

          REPLACE Q2

          So, from your sample

          dc.l $02220620 ; Tile #42
          dc.l $01012021
          dc.l $55111003 ; Tile #956
          dc.l $55001111
          dc.l $54000116 ; Tile #1628
          dc.l $00022222
          

          we get this text :

          dc.l $0Q2Q2Q206Q20 ; Tile #42
          dc.l $0101Q20Q21
          dc.l $55111003 ; Tile #956
          dc.l $55001111
          dc.l $54000116 ; Tile #1628
          dc.l $000Q2Q2Q2Q2Q2
          

          As you can see, the digits 2 have not been changed after any ; tile # area. Do you expect this output ?


          By the way, are you French ? If so, next time, I could answer in both languages ;-)

          Best Regards,

          guy038

          J 1 Reply Last reply Jan 31, 2021, 12:44 PM Reply Quote 1
          • J
            Jean-Francois Trehard
            last edited by Jan 31, 2021, 12:44 PM

            Oh many thanks to both of you !!! I’ll manage to understand all your scripts.

            My intention is to be able to change the same digit in all the 8 characters strings after the “$” with another digit.
            @Terry-R, it works great.
            @guy038, in your example, the result is adding a “Q” befor the “2” and not replacing it.

            • I’ll also try to add the condition to find only after the “$” because the data in those 8 characters strings can be from 0 to F (in hexadecimal) and before this “$” we have the characters C and D.

            Have a nice day :)

            1 Reply Last reply Reply Quote 0
            • J
              Jean-Francois Trehard @guy038
              last edited by Jan 31, 2021, 12:44 PM

              @guy038 oui, je suis français :)

              1 Reply Last reply Reply Quote 0
              • G
                guy038
                last edited by Jan 31, 2021, 2:01 PM

                Hi, @jean-francois-trehard, @terry-R and All,

                I still do not understand exactly what you want to replace the number 2 with. Do you need to insert backslashes \, too, in replacement ?

                Seeing the @terry-r answer, it seems that you want to replace any allowed 2 digit with the Q letter. Am I right about it ?

                Now, if you want to deal with hexadecimal numbers, no problem !

                Use the following regex S/R :

                SEARCH (\$|\G)[[:xdigit:]]*?\K2

                REPLACE Q

                Next time, if needed, I could give you some explanations about this S/R !

                Best regards,

                guy038


                Je n’ai toujours pas compris, exactement, par quoi tu veux remplacer le chiffre 2. As-tu besoin d’insérer des anti-slashes, également, dans le remplacement ?

                En voyant la réponse de @terry-r, je suppose que tu désires remplacer chaque chiffre 2, permis, par la lettre Q. Ai-je raison, à ce propos ?

                Maintenant, si tu veux chercher des chiffes héxadecimaux, no problem !

                Utilises la S/R suivante :

                CHERCHE (\$|\G)[[:xdigit:]]*?\K2

                REMPLACE Q

                La prochaine fois, si besoin, je pourrais te donner quelques explications sur cette S/R !

                Bien cordialement,

                guy038

                J 1 Reply Last reply Jan 31, 2021, 5:14 PM Reply Quote 1
                • J
                  Jean-Francois Trehard @guy038
                  last edited by Jan 31, 2021, 5:14 PM

                  @guy038 It works great, thank you, no more issue before the $ :)
                  I now have to understand and learn all what you both showed to me.

                  De manière générale, je souhaite juste pouvoir remplacer 1 même caractère par un autre dans toutes les chaînes de 8 caractères juste après le “$” de mon fichier, qui fait au total presque 9000 lignes. Les caractères peuvent aller de 0 à 9 et de A à F.
                  Ta dernière réponse m’a permis de ne plus avoir de souci avant le $, super :)

                  1 Reply Last reply Reply Quote 0
                  • G
                    guy038
                    last edited by guy038 Jan 31, 2021, 9:17 PM Jan 31, 2021, 9:16 PM

                    Hi, @jean-francois-trehard and All,

                    In wanting to explain, the search regex, provided in my previous post, I realized that we should add an hypothesis :

                    • You must move the caret to a blank line, located before the text to be processed by the S/R

                    • Globally, the regex (\$|\G)[[:xdigit:]]*?\K2, searches, from after a literal $ symbol OR after the end of the previous search, if any, for the smallest range, even null, of hexadecimal characters, till a 2 digit and only selects that digit 2 !

                    • Note that the \G assertion forces the regex engine to ask itself : does the current match attempt, immediately follows the previous match ? In case of a positive answer, the current match attempt is a possible match, so far.

                    • As this regex only looks for hexadecimal chars, it’s easy to understand why digits, located, in the second part of a line, after the #, are not concerned !


                    Now, if you’re still in the fog, let’s go into a little more detail and start the search, for instance, with caret on an empty line, before the first line dc.l $02220620 ; Tile #42 to process.

                    All the stuff, below, is a bit off-putting but it goes like this, if you break the process down into more basic actions !


                    • The regex engine first searches for a range of hexadecimal characters, either, after a *literal $ or after the end of the previous search. As no search has been performed so far, \G syntax matches, by default, at current position, the beginning of the empty line, which is a zero-length string

                    • But, no hexadecimal char exists in that empty line, so the regex engine skips the EOL chars and moves to the beginning of the next line dc.l $02220620 ; Tile #42. You could say : it will match the dc string, which are hexadecimal chars, also ? No, no ! Because the initial location, in the empty line, and the dc location are not contiguous. Indeed, there is the gap of the two chars : \r and \n )

                    • So, necessarily, the regex engine moves forward, 3 positions and, finally, matches the first alternative, the literal $ , as well as the 0 and the first 2 digit

                    • The \K feature cancels any match, so far and resets the regex engine working position => So, it only matches the first 2 digit and replace it with the string Q

                    • Now, as no more $ symbol exists, the regex engine needs to use the \G assertion, which represents the location right between the string Q and the next 2 digit. As the hexadecimal range of chard, before a next 2, may be null, the regex engine just matches this empty zone and the second 2 digit and, again, selects only the digit 2 and replaces it with the Q string

                    • The third 2 of current line is matched in the same way as above and replaced with the letter Q

                    • Then, the regex engine matches the range of hexadecimal chars 062, right after the Q letter, which verifies the \G assertion and selects only the fourth digit 2 of current line, due to the \K syntax and, again, replaces it with a Q letter

                    • Now, things become interesting : the regex engine must find a next range of hexadecimal chars, possibly null, ending with a 2 digit. But this next range of contiguous hexa chars is the string 42, located some chars after the end of the previous search ! So the \G assertion is not verified anymore and, as there is no other $ symbol, either, the overall search fails Thus, the regex engine skips the remaining chars of that first line and moves to beginning of the second line and the process resumes !

                    • And, if a line just ends with a first range of hexadecimal chars, it will search, next, for a literal $ symbol, as the \G feature is not true, due to the gap of the EOL characters of current line


                    As you may notice, the key point, in that story, is that, when several ranges of hexadecimal characters exist, throughout a line, these are not juxtaposed. So, the \G assertion forces, automatically, the process to cancel any search after examination of each first range of hexadecimal chars of each line ;-))

                    Wow ! Glad to see that you’re still there, after these long explanations ;-)) Thank you for your patience and full attention !


                    Jean-François, je n’ai pas trop envie de reprendre tout ce laius, en français ;-) Si, certains points te paraîssent encore obscurs, je pourrais toujours t’aider, ultérieurement !

                    Best regards,

                    guy038

                    1 Reply Last reply Reply Quote 0
                    • J
                      Jean-Francois Trehard
                      last edited by Jan 31, 2021, 9:32 PM

                      @guy038 merci.
                      I’m going to read this carefully.
                      Actually I have to change your following code (?-s)($|\G)[[:xdigit:]]*?\K2 because it does not work if there are already digits in the 8 characteres strings : the search is going to the next line each time it encounters a letter, even if it didn’t red the entire string.
                      I have to scratch my head a little bit to try to find a solution, I’ll tell you if I find a solution :)

                      1 Reply Last reply Reply Quote 0
                      • G
                        guy038
                        last edited by Jan 31, 2021, 9:49 PM

                        @jean-francois-trehard and All,

                        I don’t understand ! In the picture showing the text, below, I added some pure hexadecimal characters, either, in lower or upper case, in many locations, even after the ; Tile # string. Seemingly, all 2 digits, before comments only, are correctly marked !?

                        dc.l $02A20620 ; Tile #42
                        dc.l $0BaC2021
                        dc.l $55111003 ; Tile #9A56
                        dc.l $55001111
                        dc.l $54000116 ; Tile #1b628
                        dc.l $dF022222
                        

                        d17bf642-4fa1-4ba9-9a34-944d1d906ea5-image.png


                        Jean-françois, could you provide some text which breaks the logic down ?

                        BR

                        guy038

                        J 1 Reply Last reply Jan 31, 2021, 10:01 PM Reply Quote 0
                        • J
                          Jean-Francois Trehard @guy038
                          last edited by Jean-Francois Trehard Jan 31, 2021, 10:01 PM Jan 31, 2021, 10:01 PM

                          @guy038 oh I wrote something wrong :
                          “because it does not work if there are already digits in the 8 characteres strings”
                          I meant this : “because it does not work if there are already letters in the 8 characters strings”

                          1 Reply Last reply Reply Quote 0
                          • G
                            guy038
                            last edited by guy038 Jan 31, 2021, 10:42 PM Jan 31, 2021, 10:15 PM

                            @jean-francois-trehard and All,

                            Ah OK, I understood the problem. Actually, you meant that, once an S/R has been processed to replace, for instance, a 2 digit with the Q letter, a second search, for instance, of the 6 digit will not find all occurrences of the 6 digit, because the letter Q is not an hexadecimal number !

                            Indeed, the regex must be changed as below :

                            SEARCH (\$|\G)\w*?\K2

                            REPLACE Q


                            BTW, note that we can, also, express the search regex, using the free-spacing mode (?x), for a better readability :

                            (?x)   ( \$ | \G )   \w   *?   \K   2
                            

                            Now, concerning the explanations, in my previous post, simply change any string “hexadecimal character” with the string “word character” !!

                            Cheers,

                            guy038

                            P.S.

                            I also think that my previous explanations need to be reread. I’ll see tomorrow !

                            1 Reply Last reply Reply Quote 1
                            • J
                              Jean-Francois Trehard
                              last edited by Jean-Francois Trehard Jan 31, 2021, 10:38 PM Jan 31, 2021, 10:38 PM

                              That is awsome, thank you. This is very complex for a pure beginner.
                              Do you know if we can do multiple replacements at once in Notepad++ ?
                              Like replacing all my 0 with G, all my 1 with H, etc, in one “Replace all”.
                              I’m now checking it :)

                              1 Reply Last reply Reply Quote 0
                              • J
                                Jean-Francois Trehard
                                last edited by Jan 31, 2021, 10:45 PM

                                This is not working >
                                Find : (($|\G)\w*?\K0)|(($|\G)\w*?\K1)
                                Replace : (?1G)(?2H)

                                If I have 01001110, then I get GH1GHGH111GH

                                1 Reply Last reply Reply Quote 0
                                • T
                                  Terry R
                                  last edited by Terry R Jan 31, 2021, 10:47 PM Jan 31, 2021, 10:46 PM

                                  @Jean-Francois-Trehard said in Replacing a digit with exception:

                                  Do you know if we can do multiple replacements at once in Notepad++ ?

                                  If you go with my regex, altered to cater for the different changes to be made you can do it ALL in 1 pass.

                                  So the new regex is
                                  Find What:(?-s)(;.*$)|(0)|(1)|(2)|(3)|(4)
                                  Replace With:(?1\1)(?2G)(?3H)(?4I)(?5J)(?6K)

                                  This is just an example as you have not provided ALL the changes you want, but the idea is the same. My regex can be extended to cater for as many characters as you want changed in the one run.

                                  Terry

                                  1 Reply Last reply Reply Quote 2
                                  • T
                                    Terry R
                                    last edited by Terry R Jan 31, 2021, 10:55 PM Jan 31, 2021, 10:53 PM

                                    @Terry-R said in Replacing a digit with exception:

                                    (?1\1)(?2G)(?3H)(?4I)(?5J)(?6K)

                                    Actually just thinking that you possibly need to change more than 9 groups (in total), the more correct coding would be:
                                    (?{1}\1)(?{2}G)(?{3}H)(?{4}I)(?{5}J)(?{6}J)
                                    So this allows for lots of groups to be identified. In your case you would be using …(?{10}X)(?{11}Y)(?{12}Z) as an example.

                                    Terry

                                    1 Reply Last reply Reply Quote 2
                                    • G
                                      guy038
                                      last edited by guy038 Jan 31, 2021, 11:12 PM Jan 31, 2021, 11:06 PM

                                      Hi, @jean-francois-trehard, @terry-r and All,

                                      No problem too. We can do miracles with regexes ;-))

                                      This time :

                                      • We must use a non-capturing group at the very beginning of the regex

                                      • We must add a second non-capturing group to get the \K feature for search of, either, the digits 0, 1 or 3, only

                                      • We must add inner real groups in order that :

                                        • Group 1 is defined when a 0 digit is matched

                                        • Group 2 is defined when a 1 digit is matched

                                        • Group 3 is defined when a 2 digit is matched

                                      And the replacement zone is self-explanatory !

                                      So, given this text :

                                      dc.l $02A20620 ; Tile #42
                                      dc.l $0BaC2021
                                      dc.l $55111003 ; Tile #9A56
                                      dc.l $55001111
                                      dc.l $54000116 ; Tile #1b628
                                      dc.l $dF022222
                                      

                                      The following regex S/R :

                                      SEARCH (?:\$|\G)\w*?\K(?:(0)|(1)|(2))

                                      REPLACE (?1G)(?2H)(?3I)

                                      will output, in one pass, the expected text, leaving the comments untouched

                                      dc.l $GIAIG6IG ; Tile #42
                                      dc.l $GBaCIGIH
                                      dc.l $55HHHGG3 ; Tile #9A56
                                      dc.l $55GGHHHH
                                      dc.l $54GGGHH6 ; Tile #1b628
                                      dc.l $dFGIIIII
                                      

                                      Again, if we use the (?x) modifier for the free-spacing behavior, the search regex becomes :

                                      (?x)  (?: \$ | \G )  \w  *?  \K  (?:  (0) | (1) | (2)  )
                                      

                                      However, note that the replacement regex cannot be expressed with the free-spacing mode, but, for this specific replacement, the order of the (?#<letter> blocks does not matter !

                                      So, the replacement regex (?3I)(?2H)(?1G) would be correct, as well !

                                      BR

                                      guy038

                                      1 Reply Last reply Reply Quote 1
                                      • J
                                        Jean-Francois Trehard
                                        last edited by Feb 1, 2021, 8:18 AM

                                        @guy038, @Terry-R,
                                        Thanks to both of you, this working great now !
                                        Have a nice day :)

                                        1 Reply Last reply Reply Quote 1
                                        • G
                                          guy038
                                          last edited by guy038 Feb 2, 2021, 5:56 PM Feb 2, 2021, 5:54 PM

                                          Hello, @jean-francois-trehard and All,

                                          In wanting to explain, my regex S/R, provided in my previous post, I realized that we should add an hypothesis :

                                          • You must move the caret to a blank line, located before the text to be processed by the S/R

                                          So, let’s start with the last version :

                                          SEARCH (?:\$|\G)\w*?\K(?:(0)|(1)|(2))

                                          REPLACE (?1G)(?2H)(?3I)


                                          • Globally, the regex (?:\$|\G)\w*?\K(?:(0)|(1)|(2)), searches, from after a literal $ symbol OR after the end of the previous search, if any, for the smallest range, even empty, of words characters, till a 0, 1 or 2 digit, then only selects this last digit and replaces it, respectively, with the G, H or I letters

                                          • Note that the \G assertion forces the regex engine to ask itself : does the current match attempt, immediately follows the previous match ? In case of a positive answer, the current match attempt is a possible match, so far !

                                          • As this regex only looks for consecutive words chars, it’s easy to understand why the characters, located, in the second part of a line, after the # sign, are not concerned !


                                          Now, if you’re still in the fog, let’s go into a little more detail and start the search, for instance, with caret on an empty line, right before the first line dc.l $02220620 ; Tile #42

                                          Beware, everything below is a bit “off-putting” but, in any case, it happens like that if you break the process down into smaller basic actions !


                                          • The regex engine first searches for a range of consecutive words characters till a 0, 1 or 2 digit, either, after a literal $ or after the end of the previous search. As no search has been performed, so far, \G syntax matches, by default, at current position, the beginning of the empty line, which is a zero-length string

                                          • As no word char exists in that empty line, the regex engine skips the EOL chars and moves to the beginning of the next line dc.l $02220620 ; Tile #42. You could say : it should match the dc string, which are word chars, also ? No, no ! Because the initial location, in the empty line, and the dc location are not contiguous. Indeed, there is a gap of the two chars : \r and \n )

                                          • Thus, the \G assertion is not true, presently. So the regex engine tries to match the first alternative and skips to the literal $ and the next 0 digit to search for

                                          • The \K feature cancels any match, so far and resets the regex engine working position => So, it only matches this first 0 digit. Remember that this configuration is possible as the range of chars before digit 0, 1 or 2 to search for, may be empty

                                          • As the group 1 is defined, when matching the 0 digit, the regex engine replaces it with the string G

                                          • Now, as no more $ symbol exists, the regex engine needs to use the second alternative, the \G assertion, which represents the location right between the letter G and the next range of word chars till a 0, 1 or 2 digit. So it just matches the first 2 digit, right after and, again, only selects that digit 2

                                          • As the group 3 is defined when matching the 2 digit, the regex engine replaces it with the string I

                                          • The second, third and fourth 2 digit, of current line, as well as the second 0 digit, are matched, in the same way as above, and replaced, consequently, with the appropriate letter

                                          • Then, the regex engine matches the 62 digit, right after the previous G letter, which verifies the \G assertion and selects only the fourth digit 2 of current line, due to the \K syntax

                                          • Again, this 2 digit is replaced with a I letter, as group 3 is defined when matching the 2 digit

                                          • The regex engine advances one position and matches the last 0 digit of current line and replaces it with the G letter

                                          • Now, things become interesting : the regex engine must find a next range of consecutive word chars, possibly empty, ending with, either, a 0, 1 or 2 digit. Obviously, this next range of contiguous word chars is the string 42, located some chars after the end of our previous search ! So the \G assertion is not verified anymore and, as there is no other $ symbol, either, the overall search fails. Thus, the regex engine skips the remaining chars of that first line, after the third 0 digit, which has been changed into G and moves to the beginning of the second line where the process resumes !

                                          • And, when a line just ends with a first range of word chars, without any comments zone, it, necessarily, will search for a literal $ symbol, as the \G feature cannot be true, due to the gap produced by the EOL characters of current line !


                                          As you may notice, the key point, in this kind of data, is that, the several ranges of words characters are not juxtaposed. So, the \G assertion forces, automatically, the process to cancel any further search after examination of each first range of consecutive words chars of each line, following a $ symbol ;-))

                                          Note also that, due to the \K syntax, inside this search regex, you cannot use a step by step replacement with several clicks on the Replace button. However, you can use the Find Next button, to get the different matches


                                          Wow ! Glad to see that you’re still there, after these long explanations ;-)) Thank you for your patience and full attention !

                                          Best Regards,

                                          guy038

                                          1 Reply Last reply Reply Quote 1
                                          3 out of 21
                                          • First post
                                            3/21
                                            Last post
                                          The Community of users of the Notepad++ text editor.
                                          Powered by NodeBB | Contributors