find and replace help



  • hi all this is my first post.

    i have a text file with something like this

    IF NAME==“C15_x_33.9”
    DEPTH=115
    ENDIF
    IF NAME==“C12_x_30”
    DEPTH=1
    12
    ENDIF

    and i need to do two things

    1. place the name in front of all the lines
      2)if the name has a “.” like 33.9, it needs to be replaced with an underscore 33_9
      so the final format would look like this:

    IF NAME==“C15_x_33.9”
    C15_x_33_9_DEPTH=115
    ENDIF
    IF NAME==“C12_x_30”
    C12_x_30_DEPTH=1
    12
    ENDIF

    so i have thousands of these and i have been trying and im just about to give up. thought i would give this community a shot before i quit

    i have been trying for hours and hours using regular expressions and i cant get it to work.
    any help would be greatly appreciated.

    thank you,
    Jg



  • Hello Joe Grant,

    I’m not particularly good in regex but I hope I have a solution for you.
    From the given example I would solve it like so

    Find C… and linebreaks and DEPTH… and replace C… line with the same and D… line with C line added in front.

    Find what: (C.*)(")\r\n(DEPTH.*)
    Replace with: \1\2\r\n\1_\3
    

    Next is to replace the dot with underscore

    Find what: (C.*)(\.)(.*DEPTH.*)
    Replace with:\1_\3
    

    So as you see \1 is the first match \2 second and so on …

    As this regex is working by the provided example it might be
    that real data is affected differently because my assumptions
    aren’t valid. Assumptions like the IF… line only has the C char in the name part,
    never before or afterwards and …

    Cheers
    Claudia



  • Hello Joe,

    Many modifications can be done with the help of regular expressions :-))

    First of all, we miss some points about your file :

    • May the different names contain more than one dot as, for instance, C15.x.33.9 ? I suppose NOT, as the final number seems to be either an integer or a float number, doesn’t it ?

    • May the IF - ENDIF structure contain more than one line ?

    For instance :

    IF NAME==“C15_x_33.9”
    DEPTH=115
    LENGTH=70
    WIDTH=30
    ENDIF
    

    which should be, therefore, replaced with :

    IF NAME==“C15_x_33.9”
    C15_x_33_9_DEPTH=115
    C15_x_33_9_LENGTH=70
    C15_x_33_9_WIDTH=30
    ENDIF
    

    I just relied on your present example, with a IF - ENDIF structure which contains ONE line only !

    • When I copied your text, in a new tab, with CTRL-C / CTRL-V, the two standard double quotes ( " ) were changed into the LEFT DOUBLE QUOTATION MARK “ ( \x{201c} ) and the RIGHT DOUBLE QUOTATION MARK ” ( \x{201d} ) I will assume that you rather use the standard QUOTATION MARK, don’t you ?

    Well, with these hypotheses, and, in additiion to the Claudia’s solution, I would suggest the following S/R :

    SEARCH (?-s)^IF NAME=="(?|(.+)(\.)(.*)|(.+))"\R\K

    REPLACE \1_(?2\3_)

    • Don’t forget to select the Regular expression search mode !

    • Click on the Replace All button ONLY ( Due to the \K syntax, you must NOT use the Replace button !! )


    At first sight, that regex seems difficult, but it’s a nice opportunity to explore :

    • The internal modifiers (?s)

    • The branch reset alternative pattern (?|...|...|...|...)

    • The line ending escape sequence \R

    • The kept back form \K

    • The conditional replacement pattern (?#.....)


    So :

    • The (?-s) form is an modifier that means that the dot character matches a standard character only. The opposite form (?s) means that the dot can match any character, even end of line characters.

    • If your condition IF NAME may occur, in lowercase, just add the insensitive modifier i => your regex will, then, begin with (?i-s)

    • Note that these modifiers have priority on the same options, in the Replace dialog ( Match case and . matches newline options )

    • The part ^IF NAME==" just tries to match the literal string IF NAME==", at the beginning of a line

    • The part (?|(.+)(\.)(.*)|(.+))" is an alternative, that looks :

      • For any non null range of characters, followed with a literal dot, then followed with any range, possibly null, of characters
        OR
      • For any non null range of characters
    • In that piece of the regex :

      • The literal dot have to be escaped, as it’s a special character in regexes

      • Either, the dot and the parts, before and after it, are surrounded by parentheses, in order to consider them single groups, generally re-used the the replacement regex

      • Due to the ?| syntax at the beginning of the alternative (....|....), the group numbering is reset, for each branch of the alternative :

        • If the first alternative is chosen ( case where the name contains a dot ), the part before the dot is group 1, the dot represents the group 2 and the part after the dot is the group 3

        • If the second alternative matched ( case the name does NOT contain a dot ), the single group (.+) is considered, again, to be the group 1

      • Whatever alternative matches the name, it must match the ending quote character

    • The \R exactly represents the atomic group (?>\x0d\x0a?|[\x0a-\x0c\x85\x{2028}\x{2029}]), but, practically, we just have to remember that it matches any standard EOL : \r\n, for Windows files, \n, for Unix files or \r for old Mac files

    • Finally, in the search regex, due to the \K syntax, everything already matched ( that is to say, the complete line with its EOL characters ) is “forgotten”, so the final regex matched is, only, the null string, located between the EOL character \n and the first letter D, of the word DEPTH

    This null string is, then, replaced with :

    • The group 1 ( part, of the name, before the dot OR the entire name ) followed by an underscore => \1_ )

    • If a dot has been found in the name( ìf group 2 exists ), we must re-write the part of the name, after the dot ( group 3 ), followed, again, with an underscore => (?2\3_). Note that the general form of a conditional replacement is (?#....:....). For instance (?4abc:xyz) means the string *abc is rewritten, if group 4 EXISTS and the string xyz is rewritten, if the group 4 could NOT be defined

    Best Regards,

    guy038

    P.S. :

    You’ll find good documentation, about the new Boost C++ Regex library ( similar to the PERL Regular Common Expressions ) used by Notepad++, since the 6.0 version, at the TWO addresses below :

    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

    • The FIRST link explains the syntax, of regular expressions, in the SEARCH part

    • The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part



  • Hi guy038,

    AGAIN, a nice one and a very good description too, even I understood it.
    But, there is always a but, did you notice that your regex seems to break
    the replace (don’t know how to say it in other words) function?
    What I mean is if you use your regex and press find next button,
    it selects the DEPTH… line and if you press replace button nothing
    gets changed, where as you press the replace all button, it will be replaced.
    Do you think this is a bug or is it because of the complex regex?

    Tested with npp6.8.7 and 6.8.8 on windows 7 x64.

    Cheers
    Claudia



  • Hi Claudia,

    No, It’s not related to the complexity of the regex ! It’s just that the step-by-step replace doesn’t work at all, as soon as the search regex contains, at least, one \K form :-(( Though I don’t know exactly why !?

    Consider the subject string below :

    abc
    abcdef
    abcdefghi
    abcdefghidefjkl
    

    With the simple S/R SEARCH abc\Kdef and REPLACE 123, if I click on the Replace All button, we get the right text :

    abc
    abc123
    abc123ghi
    abc123ghidefjkl
    

    Note that the second string def has not been changed, because it wasn’t just after an abc string. That’s correct !

    On the contrary, if I click, several times on the Replace button, nothing has changed !!!

    Cheers,

    guy038

    P.S.:

    I’ve just realized that the bug exists too, if we use a look-behind, instead of the \K form !

    So, the S/R SEARCH (?<=abc)def and REPLACE 123 does the job, if you click on the Replace All button, ONLY !

    Remember that, due to the look-behind feature, this regex tries to match a def string, only if preceded by the string abc


Log in to reply