Community
    • Login

    Find & Replace issues

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    19 Posts 5 Posters 12.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Scott SumnerS
      Scott Sumner
      last edited by

      @Kirk-Weir said:

      so please treat me as pretty dumb!

      Nah…nobody that is dumb could express what you are trying to do that well. :-D

      Ok, so let’s go:

      Find-what box: hyperlink=“history:/MS_MSCP_3_P2_2/.+?(\d)?(\d)?(\d)\|view:history:HistoryChart”>
      Replace-with box: hyperlink=“history:/SITE00002_L011_O062/P(?2\1\2\3:0(?1\1\3:0\3))/view:history:HistoryChart”>
      Search mode: ☑ Regular expression -AND- ☐ . matches newline

      Really the only tricky part is zero-padding a number to three digits. Here’s how that is done. The find-what expression will capture one, two or three digits as follows:

      • if one digit is present, it will be remembered as group-3 (group-1 and group-2 will be undefined)
      • if two digits are present, they will be remembered (first digit as group-1, second digit as group-3, group-2 will be undefined)
      • if three digits are present, they will be remembered, in order, as group-1, group-2, and group-3

      This part of the replace regex:

      (?2\1\2\3:0(?1\1\3:0\3))

      does the following:

      • test if group-2 is defined, we have 3 digits in the original match, so insert remembered group-1 digit followed by group-2 digit followed by group-3 digit; we are DONE
      • if group-2 is NOT defined, insert a 0, and then:
        test if group-1 is defined, we have 2 digits in the original match, insert remembered group-1 digit and group-3 digit; we are DONE
        if group-1 is NOT defined, we have one digit in the original match, so insert another 0 (now there are a total of two zeroes), and insert remembered group-3 digit
      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hello, @kirk-weir and @scott-sumner,

        Concerning the main changes :

        ###  =>  P###
        ##   =>  P0##
        #    =>  P00# 
        

        I found out, Scott, an alternative solution, to your S/R :

        SEARCH (\d)?(\d)?(\d)

        REPLACE P(?2\1\2\3:0(?1\1\3:0\3))

        I propose the following regex S/R :

        SEARCH ((\d)?(\d)?\d)

        REPLACE P(?2:0)(?3:0)\1

        Notes :

        • If the two groups 2 and 3 are, both, defined ( case number with 3 digits ), we rewrite the P string, only, followed by the number
          ( Group 1 )

        • If the group 2 is defined and group 3 is NOT defined ( case number with 2 digits ), we rewrite the P0 string, followed by the number
          ( Group 1 )

        • If the two groups 2 and 3 are, both, NOT defined ( case number with 1 digit ), we rewrite the P00 string, followed by the number
          ( Group 1 )


        Now, I think that the first part hyperlink="history:/ and the last part |view:history:HistoryChart">, of each line, do NOT need to be part of the S/R regexes, as they are unchanged, after replacement. So, we could, simply, use the following S/R :

        SEARCH (?-si)MS_MSCP_3_P2_2/.+C((\d)?(\d)?\d)

        REPLACE :SITE00002_L011_O062/P(?2:0)(?3:0)\1

        As usual, the (?-si) modifiers, at beginning of the search regex, means that :

        • The dot, special regex character, stands for any single standard character, only ( -s )

        • The regex engine searches in a NON-insensitive way ( -i )

        Best Regards,

        guy038

        Scott SumnerS 1 Reply Last reply Reply Quote 1
        • Scott SumnerS
          Scott Sumner @guy038
          last edited by

          @guy038

          I like your regex for this better than mine. I don’t really care that it is shorter; I like it because yours is more extensible (easier to see how to extend it to zero-pad out to 4, 5, etc. digits).

          As to eliminating the “extra” text from the search and replace string, this is a tough call. In proposing a solution, I don’t like to assume too much (but sometimes you have to!) about what the questioner is doing; thus I err on the side of caution and leave in stuff like that. It should be pretty easy for the questioner to decide to remove that extra plaintext from their specific use case, if appropriate, once they start experimenting with solutions posed. :-D

          Kirk WeirK 1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by guy038

            Hi, @scott-sumner,

            I didn’t realize that my solution could be extended to any amount of digits ! Thanks, Scott, for pointing this fact out :-D

            Indeed, let’s try with a number, which may have between 1 and 8 digits. And let’s suppose that we get rid of the capital letter P, beginning any number. Therefore, the corresponding regex S/R is :

            SEARCH ((\d)?(\d)?(\d)?(\d)?(\d)?(\d)?(\d)?\d)

            REPLACE (?2:0)(?3:0)(?4:0)(?5:0)(?6:0)(?7:0)(?8:0)\1

            And, applying this S/R against the subject numbers, below :

            12345
            123456
            123
            1234567
            1234
            12
            12345678
            1
            

            It would give the well lined up list of numbers, below :

            00012345
            00123456
            00000123
            01234567
            00001234
            00000012
            12345678
            00000001
            

            And, with the Replace regex (?2: )(?3: )(?4: )(?5: )(?6: )(?7: )(?8: )\1, we would obtain the same list, padded out with some space characters, as below :

               12345
              123456
                 123
             1234567
                1234
                  12
            12345678
                   1
            

            And, with the Replace regex (?2:.)(?3:.)(?4:.)(?5:.)(?6:.)(?7:.)(?8:.)\1, this time, the list would be padded out with some dot characters, as below :

            ...12345
            ..123456
            .....123
            .1234567
            ....1234
            ......12
            12345678
            .......1
            

            Finally, this general padding method may be helpful to some of us :-))

            Cheers,

            guy038

            1 Reply Last reply Reply Quote 1
            • Kirk WeirK
              Kirk Weir @Scott Sumner
              last edited by

              @Scott-Sumner & @guy038

              You guys are definitely make me feel very dumb…and very thankful!

              I’m about to give it a go so will let you know how it goes. Just have to read it 20 times or so to process your thinking/coding language.

              Thank you so much for your replies.

              1 Reply Last reply Reply Quote 0
              • Kirk WeirK
                Kirk Weir
                last edited by

                Success! With some minor alterations to your feedback, you guys have chopped possibly a 3-4 week job down to a few days!

                This is powerful stuff…in the right hands/minds!

                FYI, the alteration required was to add back in the hyperlink="history:/ to the (?-si)MS_MSCP_3_P2_2/.+C((\d)?(\d)?\d) search, but only due to the fact that there are other instances of MS_MSCP_3_P2_2/ throughout the code (which I did not tell you about!).

                Thanks a lot guys! Legends!!!
                Kirk

                1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by

                  Hello, @kirk-weir

                  For newby people, about regular expressions concept and syntax, begin with that article, in N++ Wiki :

                  http://docs.notepad-plus-plus.org/index.php/Regular_Expressions

                  In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :

                  http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

                  http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

                  • The FIRST link explains the syntax, of regular expressions, in the SEARCH part

                  • The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part


                  You may, also, look for valuable informations, on the sites, below :

                  http://www.regular-expressions.info

                  http://www.rexegg.com

                  http://perldoc.perl.org/perlre.html

                  Be aware that, as any documentation, it may contain some errors ! Anyway, if you detected one, that’s good news : you’re improving ;-))

                  Cheers,

                  guy038

                  1 Reply Last reply Reply Quote 0
                  • Kirk WeirK
                    Kirk Weir
                    last edited by

                    Hi @guy038,

                    Excellent, thanks a lot for the info. You are most kind!

                    Regards,
                    Kirk

                    1 Reply Last reply Reply Quote 0
                    • AdrianHHHA
                      AdrianHHH
                      last edited by

                      Another way of inserting (missing) leading zeros for numbers. This takes two search-and-replace operations. The first step is to insert the wanted number of leading zeros at the front of every number. The second step is to remove any unneeded zeros.

                      For the example, where numbers matching C1| are to be changed to P001|, i.e. adding two zeros.
                      First step: Replace (C)(\d{1,2}\|) with \100\2. (As Notepad++ only allows nine groups there is no ambiguity with the \100 part, it means \1 then 00.)
                      Second step: Replace (C)0+(\d{3}\|) with \1\2.

                      Scott SumnerS 1 Reply Last reply Reply Quote 0
                      • Scott SumnerS
                        Scott Sumner @AdrianHHH
                        last edited by

                        @AdrianHHH

                        As Notepad++ only allows nine groups…

                        I didn’t check your proposed solution, but rather I just wanted to point out that Notepad++ can handle more than 9 captured groups. For example, there’s a regex replacement in this thread that uses 52 capture groups!

                        1 Reply Last reply Reply Quote 0
                        • guy038G
                          guy038
                          last edited by guy038

                          Hello, @adrianhhh, @kirk-weir, @scott-sumner and All,

                          As soon as I saw your post, and understood your “philosophy” to add missing leading 0’s, my previous regex S/R, below, looks excessively complicated !!

                          SEARCH ((\d)?(\d)?(\d)?(\d)?(\d)?(\d)?(\d)?\d)

                          REPLACE (?2:0)(?3:0)(?4:0)(?5:0)(?6:0)(?7:0)(?8:0)\1

                          Indeed, your method looks better and more simple. In addition, I succeeded to simplify your two regex S/R :-))


                          So, let’s start with the original text, below, with some numbers, preceded by the letter C :

                          C12345
                          C123456
                          C123
                          C1234567
                          C1234
                          C12
                          C12345678
                          C1
                          

                          I omitted the last | character, which is useless, for our discussion. Now, as we’re searching for a formatted list of eight digits numbers, we need to insert a seven 0’s string, right after the letter C. To do so, I use the following S/R :

                          SEARCH (?<=C)

                          REPLACE 0000000

                          And I get the text :

                          C000000012345
                          C0000000123456
                          C0000000123
                          C00000001234567
                          C00000001234
                          C000000012
                          C000000012345678
                          C00000001
                          

                          Notes :

                          • As the search is only a look-behind construction, it matches the zero-length position, right after the letter C

                          • And, at that position, it simply inserts, in replacement, the 0000000 string !!


                          Now, to get the aligned table of numbers, padded out with some 0’s, I chose the following S/R, which suppresses the unnecessary 0, rather than rewriting the letter C and the different numbers to keep !

                          SEARCH (?-s)(?<=C).*(?=\d{8})

                          REPLACE Leave EMPTY

                          We obtain, at once, the correct list below :

                          C00012345
                          C00123456
                          C00000123
                          C01234567
                          C00001234
                          C00000012
                          C12345678
                          C00000001
                          

                          Magic, isn’t it !

                          Notes :

                          • As usual, the modifier (?-s), ensures that the special dot character will match standard characters, only !

                          • The search regex looks for any amount, even empty, of standard characters ( .* ), if two conditions are true :

                            • This range of characters must be preceded by the C letter ( (?<=C) )

                            • This range of characters must be followed by an eight digits number ( (?=\d{8}) )

                          • As the replacement part is EMPTY, this range is just deleted


                          To sump up, in order to obtain an aligned list of n digits numbers, padded out with a particular character :

                          • Choose the fix string, located right before the padded characters to insert. Note that the first replacement zone could have contained n spaces or n dots or any other padded character !

                          • Repeat the look-behind in the second search zone and use the (?=\d{n}) look-ahead


                          As you see, Adrian, it’s a good lesson ! Very often, two simple consecutive S/R are better that a single complicated one :-D

                          Best Regards,

                          guy038

                          1 Reply Last reply Reply Quote 1
                          • AdrianHHHA
                            AdrianHHH
                            last edited by

                            Thanks for the clarification @Scott-Sumner. My wording may have been poor; Notepad++ allows more than nine groups but the backslash only allows nine.

                            I have just re-checked the Boost page on replacements (see http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html ). The table of escape sequences shows \D as “If D is a decimal digit in the range 1-9, then outputs the text that matched sub-expression D.” Thus this backslash form only allows 9 captures. Additional captures can be accessed as shown in the Placeholder Sequences table by using ${n}which "Outputs what matched the n’th sub-expression".

                            1 Reply Last reply Reply Quote 1
                            • AdrianHHHA
                              AdrianHHH
                              last edited by

                              @guy038 I have seldom used look-behinds or look-aheads and I do not remember their syntax. For the times I have needed complicated search and replaces the performance difference between the non-look-behind(or ahead) form and the form with look-behinds(or aheads) is much less than the mental effort it would take me to change my approach. Having said that, I am very happy that you have found another, possibly neater, way of using my idea.

                              1 Reply Last reply Reply Quote 0
                              • James PhodenJ
                                James Phoden
                                last edited by

                                Hello sorry to jump onto this thread but it thought it better then starting a new one as my issue is related.

                                I need to find and replace this
                                Original String
                                “Prod_Data:Logos:Race Logos:illing Logo&Maps:Maps:AWT:6f AWT.eps”
                                Required String
                                “\\grp-pserv-01wl\Prod_Data\Logos\Race Logos\illing Logo&Maps\Maps\AWT\6f AWT.eps”

                                So basically replace the "Prod_Data: with "\\grp-pserv-01wl\Prod_Data\
                                and : with \ but only replace them if the original string is within within the quote marks “”.

                                Scott SumnerS 1 Reply Last reply Reply Quote 0
                                • Scott SumnerS
                                  Scott Sumner @James Phoden
                                  last edited by

                                  @James-Phoden

                                  So the part that makes your situation ugly is that I presume there can be a variable amount of :xxxx in your real data–you didn’t say… For example, maybe all of the following are valid things you want to match for replacement:

                                  "Prod_Data:Logos:Race Logos:illing Logo&Maps:Maps:AWT:6f AWT.eps"
                                  "Prod_Data:Logos:Race Logos:illing Logo&Maps:Maps:AWT AWT.eps"
                                  "Prod_Data:Logos:Race Logos:illing Logo&Maps:Maps AWT.eps"
                                  "Prod_Data:Logos:Race Logos:illing Logo&Maps AWT.eps"
                                  "Prod_Data:Logos:Race Logos AWT.eps"
                                  "Prod_Data:Logos AWT.eps"
                                  

                                  If this is NOT the case and you always have SIX sets of :xxxx then the situation is a lot less ugly. But, moving forward with a variable (but bounded) count (from 1 to 10 occurrences, for example), try this [and PLEASE use copy-n-paste… :-) ]:

                                  Find what zone: (?-s)"Prod_Data(?::(.+?))(?::(.+?))?(?::(.+?))?(?::(.+?))?(?::(.+?))?(?::(.+?))?(?::(.+?))?(?::(.+?))?(?::(.+?))?(?::(.+?))?"
                                  Replace with zone: "\\\\grp-pserv-01wl\\Prod_Data(?1\\${1})(?2\\${2})(?3\\${3})(?4\\${4})(?5\\${5})(?6\\${6})(?7\\${7})(?8\\${8})(?9\\${9})(?10\\${10})"
                                  Search mode radio-button: Regular expression

                                  What this is doing is matching whatever follows the individual colons (after your "Prod_Data leading text is matched) into the capture groups 1-10. At replacement time, the colons (converted to backslashes) plus the captured groups are conditionally inserted into the output stream. The conditional syntax is necessary because of the variable number of occurrences of \xxxx that might be needed.

                                  Thus, each (?::(.+?))? in the FW string captures a :xxxx – the second occurrence of : in this is the real/literal colon…the first colon is part of (?: which is just syntax saying “group the stuff in the parentheses but don’t capture it for later use”.

                                  And each (?y\\${y}) (where y = 1…10) in the RW string represents a backslash and the original xxxx. When group 8 (for example) doesn’t exist the syntax (?8 will evaluate to false and whatever occurs between the (?8 and the next )` will NOT be part of the replacement data.

                                  Thus the data above will convert to the following:

                                  "\\grp-pserv-01wl\Prod_Data\Logos\Race Logos\illing Logo&Maps\Maps\AWT\6f AWT.eps"
                                  "\\grp-pserv-01wl\Prod_Data\Logos\Race Logos\illing Logo&Maps\Maps\AWT AWT.eps"
                                  "\\grp-pserv-01wl\Prod_Data\Logos\Race Logos\illing Logo&Maps\Maps AWT.eps"
                                  "\\grp-pserv-01wl\Prod_Data\Logos\Race Logos\illing Logo&Maps AWT.eps"
                                  "\\grp-pserv-01wl\Prod_Data\Logos\Race Logos AWT.eps"
                                  "\\grp-pserv-01wl\Prod_Data\Logos AWT.eps"
                                  

                                  If this (or ANY posting on the Notepad++ Community site) is useful, don’t reply with a “thanks”, simply up-vote ( click the ^ in the ^ 0 v area on the right ).

                                  1 Reply Last reply Reply Quote 1
                                  • guy038G
                                    guy038
                                    last edited by

                                    Hi, @Scott-sumner and @james-phoden,

                                    Scott, why don’t you use the simple S/R, below :

                                    SEARCH :|(?-i)(Prod_Data)

                                    REPLACE \\(?1\\grp-pserv-01wl\\\1)

                                    Of course, it works, strictly, with your original text. May be, you want to avoid colons, placed outside a "...." block, don’t you ?

                                    Cheers,

                                    guy038

                                    Scott SumnerS 1 Reply Last reply Reply Quote 0
                                    • Scott SumnerS
                                      Scott Sumner @guy038
                                      last edited by

                                      @guy038

                                      Yea, it’s the difference between wanting to help and making too many/few assumptions about a questioner’s data. In this case I wouldn’t go so far to assume that colons only appear in these places, but who knows?

                                      1 Reply Last reply Reply Quote 0
                                      • guy038G
                                        guy038
                                        last edited by guy038

                                        Hello, @scott-sumner and All,

                                        I’ve studied the general case of searching a specific character, ONLY IF, located inside a range of characters with delimiters.

                                        Now, two cases are possible :

                                        • Case A : an area with a same starting and ending character, as, for instance, '.....' or "....."

                                        • Case B : an area with a different starting and ending character, as, for instance, (.....), [.....], {.....} or <.....>

                                        Notes :

                                        • For our discussion, we are supposed to look for the colon character

                                        • For case A, I chose the double quotes delimiter ", as common boundary

                                        • For case B, I chose the start delimiter < and the end delimiter >


                                        Let’s begin with the easier form !

                                        Case B : A possible regex would be :

                                        SEARCH :(?=[^<\r\n]*>)

                                        REPLACE Any string or character, even EMPTY

                                        Note that this regex looks for a colon character, ONLY IF followed by a range, possibly empty, of characters, different from the first delimiter < and from the EOL characters \r and \n , till the ending delimiter >

                                        On the test example, below, the regex finds all colon characters, located inside the <.....> areas exclusively ( the ones which are underlined ) Fine !

                                        1:23:<This:is a: tiny>text:to :see<if :my :logic:>is: correct:<I: hope:that>: all: will:be<::fine,: indeed>: ! :<:>78:9
                                                  ¯    ¯                      ¯   ¯     ¯               ¯     ¯                    ¯¯     ¯              ¯   
                                        

                                        Now, in case A, the annoying thing is that it’s impossible to distinguish the two delimiters ! So, we’re going to cheat a bit ! First, we’ll replace, for instance, any area "....." by an oriented area as, for instance, #"....."@. Of course, these new boundaries must be absent from the present contents of the file !

                                        So, assuming the original text :

                                        1:23:"This:is a: small"text:to :see"if :my :logic:"is: correct:"I: hope:that": all: will:be"::fine,: indeed": ! :":"78:9
                                        

                                        The simple S/R :

                                        SEARCH ".*?"

                                        REPLACE #$0@

                                        would get the following text :

                                        1:23:#"This:is a: small"@text:to :see#"if :my :logic:"@is: correct:#"I: hope:that"@: all: will:be#"::fine,: indeed"@: ! :#":"@78:9
                                        

                                        Accordingly, the correct regex becomes :

                                        SEARCH :(?=[^#\r\n]*@)

                                        REPLACE Any string or character, even EMPTY

                                        Again, only the colons, inside the areas, which are underlined, are matched by the regex !

                                        1:23:#"This:is a: small"@text:to :see#"if :my :logic:"@is: correct:#"I: hope:that"@: all: will:be#"::fine,: indeed"@: ! :#":"@78:9
                                                   ¯    ¯                         ¯   ¯     ¯                 ¯     ¯                      ¯¯     ¯                ¯   
                                        

                                        To end with, use the simple regex :

                                        SEARCH #|@

                                        REPLACE Empty

                                        in order to get the original areas "....."

                                        Et voilà !

                                        Cheers,

                                        guy038

                                        1 Reply Last reply Reply Quote 0
                                        • First post
                                          Last post
                                        The Community of users of the Notepad++ text editor.
                                        Powered by NodeBB | Contributors