find and replace help

Joe Grant

hi all this is my first post.

i have a text file with something like this

IF NAME==“C15_x_33.9”
DEPTH=115
ENDIF
IF NAME==“C12_x_30”
DEPTH=112
ENDIF

and i need to do two things

place the name in front of all the lines
2)if the name has a “.” like 33.9, it needs to be replaced with an underscore 33_9
so the final format would look like this:

IF NAME==“C15_x_33.9”
C15_x_33_9_DEPTH=115
ENDIF
IF NAME==“C12_x_30”
C12_x_30_DEPTH=112
ENDIF

so i have thousands of these and i have been trying and im just about to give up. thought i would give this community a shot before i quit

i have been trying for hours and hours using regular expressions and i cant get it to work.
any help would be greatly appreciated.

thank you,
Jg

Claudia Frank

Hello Joe Grant,

I’m not particularly good in regex but I hope I have a solution for you.
From the given example I would solve it like so

Find C… and linebreaks and DEPTH… and replace C… line with the same and D… line with C line added in front.

Find what: (C.*)(")\r\n(DEPTH.*)
Replace with: \1\2\r\n\1_\3

Next is to replace the dot with underscore

Find what: (C.*)(\.)(.*DEPTH.*)
Replace with:\1_\3

So as you see \1 is the first match \2 second and so on …

As this regex is working by the provided example it might be
that real data is affected differently because my assumptions
aren’t valid. Assumptions like the IF… line only has the C char in the name part,
never before or afterwards and …

Cheers
Claudia

guy038

Hello Joe,

Many modifications can be done with the help of regular expressions :-))

First of all, we miss some points about your file :

May the different names contain more than one dot as, for instance, C15.x.33.9 ? I suppose NOT, as the final number seems to be either an integer or a float number, doesn’t it ?
May the IF - ENDIF structure contain more than one line ?

For instance :

IF NAME==“C15_x_33.9”
DEPTH=115
LENGTH=70
WIDTH=30
ENDIF

which should be, therefore, replaced with :

IF NAME==“C15_x_33.9”
C15_x_33_9_DEPTH=115
C15_x_33_9_LENGTH=70
C15_x_33_9_WIDTH=30
ENDIF

I just relied on your present example, with a IF - ENDIF structure which contains ONE line only !

When I copied your text, in a new tab, with CTRL-C / CTRL-V, the two standard double quotes ( " ) were changed into the LEFT DOUBLE QUOTATION MARK “ ( \x{201c} ) and the RIGHT DOUBLE QUOTATION MARK ” ( \x{201d} ) I will assume that you rather use the standard QUOTATION MARK, don’t you ?

Well, with these hypotheses, and, in additiion to the Claudia’s solution, I would suggest the following S/R :

SEARCH (?-s)^IF NAME=="(?|(.+)(\.)(.*)|(.+))"\R\K

REPLACE \1_(?2\3_)

Don’t forget to select the Regular expression search mode !
Click on the Replace All button ONLY ( Due to the \K syntax, you must NOT use the Replace button !! )

At first sight, that regex seems difficult, but it’s a nice opportunity to explore :

The internal modifiers (?s)
The branch reset alternative pattern (?|...|...|...|...)
The line ending escape sequence \R
The kept back form \K
The conditional replacement pattern (?#.....)

So :

The (?-s) form is an modifier that means that the dot character matches a standard character only. The opposite form (?s) means that the dot can match any character, even end of line characters.
If your condition IF NAME may occur, in lowercase, just add the insensitive modifier i => your regex will, then, begin with (?i-s)
Note that these modifiers have priority on the same options, in the Replace dialog ( Match case and . matches newline options )
The part ^IF NAME==" just tries to match the literal string IF NAME==", at the beginning of a line
The part (?|(.+)(\.)(.*)|(.+))" is an alternative, that looks :
- For any non null range of characters, followed with a literal dot, then followed with any range, possibly null, of characters
  OR
- For any non null range of characters
In that piece of the regex :
- The literal dot have to be escaped, as it’s a special character in regexes
- Either, the dot and the parts, before and after it, are surrounded by parentheses, in order to consider them single groups, generally re-used the the replacement regex
- Due to the ?| syntax at the beginning of the alternative (....|....), the group numbering is reset, for each branch of the alternative :
  - If the first alternative is chosen ( case where the name contains a dot ), the part before the dot is group 1, the dot represents the group 2 and the part after the dot is the group 3
  - If the second alternative matched ( case the name does NOT contain a dot ), the single group (.+) is considered, again, to be the group 1
- Whatever alternative matches the name, it must match the ending quote character
The \R exactly represents the atomic group (?>\x0d\x0a?|[\x0a-\x0c\x85\x{2028}\x{2029}]), but, practically, we just have to remember that it matches any standard EOL : \r\n, for Windows files, \n, for Unix files or \r for old Mac files
Finally, in the search regex, due to the \K syntax, everything already matched ( that is to say, the complete line with its EOL characters ) is “forgotten”, so the final regex matched is, only, the null string, located between the EOL character \n and the first letter D, of the word DEPTH

This null string is, then, replaced with :

The group 1 ( part, of the name, before the dot OR the entire name ) followed by an underscore => \1_ )
If a dot has been found in the name( ìf group 2 exists ), we must re-write the part of the name, after the dot ( group 3 ), followed, again, with an underscore => (?2\3_). Note that the general form of a conditional replacement is (?#....:....). For instance (?4abc:xyz) means the string *abc is rewritten, if group 4 EXISTS and the string xyz is rewritten, if the group 4 could NOT be defined

Best Regards,

guy038

P.S. :

You’ll find good documentation, about the new Boost C++ Regex library ( similar to the PERL Regular Common Expressions ) used by Notepad++, since the 6.0 version, at the TWO addresses below :

http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

The FIRST link explains the syntax, of regular expressions, in the SEARCH part
The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part

Claudia Frank

Hi guy038,

AGAIN, a nice one and a very good description too, even I understood it.
But, there is always a but, did you notice that your regex seems to break
the replace (don’t know how to say it in other words) function?
What I mean is if you use your regex and press find next button,
it selects the DEPTH… line and if you press replace button nothing
gets changed, where as you press the replace all button, it will be replaced.
Do you think this is a bug or is it because of the complex regex?

Tested with npp6.8.7 and 6.8.8 on windows 7 x64.

Cheers
Claudia

guy038

Hi Claudia,

No, It’s not related to the complexity of the regex ! It’s just that the step-by-step replace doesn’t work at all, as soon as the search regex contains, at least, one \K form :-(( Though I don’t know exactly why !?

Consider the subject string below :

abc
abcdef
abcdefghi
abcdefghidefjkl

With the simple S/R SEARCH abc\Kdef and REPLACE 123, if I click on the Replace All button, we get the right text :

abc
abc123
abc123ghi
abc123ghidefjkl

Note that the second string def has not been changed, because it wasn’t just after an abc string. That’s correct !

On the contrary, if I click, several times on the Replace button, nothing has changed !!!

Cheers,

guy038

P.S.:

I’ve just realized that the bug exists too, if we use a look-behind, instead of the \K form !

So, the S/R SEARCH (?<=abc)def and REPLACE 123 does the job, if you click on the Replace All button, ONLY !

Remember that, due to the look-behind feature, this regex tries to match a def string, only if preceded by the string abc