RegEx Expression help with keeping TEXT and replacing formating codes only.



  • This RegEx Expression coding is way over my head, however if someone can please help me with this request, I will be well on my way.

    I have many lines of text strings which has before and after formatting, for example…
    {\i\cf15 I love Andrea}
    {\i\cf15 I love Betty}
    {\i\cf15 I love Cathy}
    etc…

    I want to replace formatting so that the new lines of text will be…
    <i>I love Andrea</i>
    <i>I love Betty</i>
    <i>I love Cathy</i>

    I know I could be done in two steps, like…
    Search for all {\i\cf15 , and replace with <i>
    Search for all } , and replace with </i>

    However this will interfere with other similar codes, so if possible, I need to do this search and replace procedure in one step.

    I will appreciate anyones help on this. Thanks.



  • This is so easy. You ever searched regex syntax?
    Find: \Q{\i\cf15\E(.+?)\Q}\E
    Replace with: <i>\1</i>



  • You said “This is so easy” LOL :)

    Thank you so much. You are a star!

    It worked, but I would not have worked it out for myself in a 100 years. This stuff is way over my head.

    I’m going to make a BIG NOTE of this, so I never have trouble with it again.

    Thanks again.



  • Hello @elija-5801 and All,

    Hi, @elija-5801, the @古旮 regex S/R is easy enough to understand :

    • Everything between the regex boundaries \Q and \E is taken as a literal string. So, the regex engine searches, first, for the literal \i\cf15 string

    • At the end, of the regex, again the \Q}\E syntax looks for the litteral } symbol

    • And, between these literal strings, the regex engine tries to match the (.+?) regex, which represents :

      • Any single standard character ( assuming the . matches newline option is not set

      • Present one or more times, due to the + quantifier, ( shortened syntax of {1,} )

      • Till the very first occurrence of the } symbol, because of the ? special symbol

      • All the characters caught are stored, as group 1, thanks to the parentheses

    • In replacement, it re-writes :

      • The literal string <i>

      • All the characters of group 1 ( Andrea, Betty , Cathy ), because of the \1 syntax

      • The literal string </i>


    Here is, below, my alternative solution :

    SEARCH (?i-s)\{([^a-z].*?\h+)?|(\h*\})

    REPLACE ?2</i>:<i>

    With my version :

    • The braces may be followed or preceded by some horizontal blank characters ( Space or Tabulation chars )

    • I supposed that the first part ( the reference part ! ), if present, must :

      • Begin with a NON-letter character, whatever its case

      • End with, at least, one horizontal blank char ( Space or Tabulation )


    Notes :

    • In the searched regex :

      • First the (?i-s) modifiers forces the regex engine to consider that :

        • The search is case insensitive

        • The . dot stands for any single standard character

      • Then, the regex engine looks for, either, any alternative, separated by the | symbol :

        • The regex \{([^a-z].*?\h+)? which tries to catch a literal { char, followed with the optional block [^a-z].*?\h+ ( due to the ? quantifier ), which represents, itself, a non-letter char, followed with the smallest range of standard characters, even null, ended with some horizontal blank characters

        • The regex (\h*\}) looks for possible horizontal blank characters, followed by a literal } symbol, stored as group 2, as embedded within parentheses

    • The replacement regex ?2</i>:<i> is a conditional replacement structure, which rewrites :

      • The literal string </i> if group 2 exits

      • The literal string <i> if group 2 does not exit


    So, for instance, with the text, below :

    {\i\cf15 I love Andrea}
    {\Te-12  	   	  I love Betty}
    {I love Cathy   	  }
    {    I love Marie}
    {789 I love Suzan}
    

    You would obtain :

    <i>I love Andrea</i>
    <i>I love Betty</i>
    <i>I love Cathy</i>
    <i>I love Marie</i>
    <i>I love Suzan</i>
    

    Cheers,

    guy038

    P.S. :

    For noob people, about regular expressions concept and syntax, begin with that article, in N++ Wiki :

    http://docs.notepad-plus-plus.org/index.php/Regular_Expressions

    In addition, you’ll find good documentation, about the Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v5.8 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :

    http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

    http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

    • The FIRST link explains the syntax, of regular expressions, in the SEARCH part

    • The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part


    You may, also, look for valuable information, on the sites, below :

    http://www.regular-expressions.info

    http://www.rexegg.com

    http://perldoc.perl.org/perlre.html

    Be aware that, as any documentation, it may contain some errors ! Anyway, if you detected one, that’s good news : you’re improving ;-))



  • Hello @guy038

    Your extensive reply on posts is over and beyond what I expected, and very much appreciated.

    I have read your examples and expanded notes, and I am now so much wiser to RegEx Expressions.

    Thank you for taking out the time to help out a stranger in another part of the world.

    Thank you. :)


Log in to reply