RegEx Expression help with keeping TEXT and replacing formating codes only.

  • This RegEx Expression coding is way over my head, however if someone can please help me with this request, I will be well on my way.

    I have many lines of text strings which has before and after formatting, for example…
    {\i\cf15 I love Andrea}
    {\i\cf15 I love Betty}
    {\i\cf15 I love Cathy}

    I want to replace formatting so that the new lines of text will be…
    <i>I love Andrea</i>
    <i>I love Betty</i>
    <i>I love Cathy</i>

    I know I could be done in two steps, like…
    Search for all {\i\cf15 , and replace with <i>
    Search for all } , and replace with </i>

    However this will interfere with other similar codes, so if possible, I need to do this search and replace procedure in one step.

    I will appreciate anyones help on this. Thanks.

  • This is so easy. You ever searched regex syntax?
    Find: \Q{\i\cf15\E(.+?)\Q}\E
    Replace with: <i>\1</i>

  • You said “This is so easy” LOL :)

    Thank you so much. You are a star!

    It worked, but I would not have worked it out for myself in a 100 years. This stuff is way over my head.

    I’m going to make a BIG NOTE of this, so I never have trouble with it again.

    Thanks again.

  • Hello @elija-5801 and All,

    Hi, @elija-5801, the @古旮 regex S/R is easy enough to understand :

    • Everything between the regex boundaries \Q and \E is taken as a literal string. So, the regex engine searches, first, for the literal \i\cf15 string

    • At the end, of the regex, again the \Q}\E syntax looks for the litteral } symbol

    • And, between these literal strings, the regex engine tries to match the (.+?) regex, which represents :

      • Any single standard character ( assuming the . matches newline option is not set

      • Present one or more times, due to the + quantifier, ( shortened syntax of {1,} )

      • Till the very first occurrence of the } symbol, because of the ? special symbol

      • All the characters caught are stored, as group 1, thanks to the parentheses

    • In replacement, it re-writes :

      • The literal string <i>

      • All the characters of group 1 ( Andrea, Betty , Cathy ), because of the \1 syntax

      • The literal string </i>

    Here is, below, my alternative solution :

    SEARCH (?i-s)\{([^a-z].*?\h+)?|(\h*\})

    REPLACE ?2</i>:<i>

    With my version :

    • The braces may be followed or preceded by some horizontal blank characters ( Space or Tabulation chars )

    • I supposed that the first part ( the reference part ! ), if present, must :

      • Begin with a NON-letter character, whatever its case

      • End with, at least, one horizontal blank char ( Space or Tabulation )

    Notes :

    • In the searched regex :

      • First the (?i-s) modifiers forces the regex engine to consider that :

        • The search is case insensitive

        • The . dot stands for any single standard character

      • Then, the regex engine looks for, either, any alternative, separated by the | symbol :

        • The regex \{([^a-z].*?\h+)? which tries to catch a literal { char, followed with the optional block [^a-z].*?\h+ ( due to the ? quantifier ), which represents, itself, a non-letter char, followed with the smallest range of standard characters, even null, ended with some horizontal blank characters

        • The regex (\h*\}) looks for possible horizontal blank characters, followed by a literal } symbol, stored as group 2, as embedded within parentheses

    • The replacement regex ?2</i>:<i> is a conditional replacement structure, which rewrites :

      • The literal string </i> if group 2 exits

      • The literal string <i> if group 2 does not exit

    So, for instance, with the text, below :

    {\i\cf15 I love Andrea}
    {\Te-12  	   	  I love Betty}
    {I love Cathy   	  }
    {    I love Marie}
    {789 I love Suzan}

    You would obtain :

    <i>I love Andrea</i>
    <i>I love Betty</i>
    <i>I love Cathy</i>
    <i>I love Marie</i>
    <i>I love Suzan</i>



    P.S. :

    For noob people, about regular expressions concept and syntax, begin with that article, in N++ Wiki :

    In addition, you’ll find good documentation, about the Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v5.8 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :

    • The FIRST link explains the syntax, of regular expressions, in the SEARCH part

    • The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part

    You may, also, look for valuable information, on the sites, below :

    Be aware that, as any documentation, it may contain some errors ! Anyway, if you detected one, that’s good news : you’re improving ;-))

  • Hello @guy038

    Your extensive reply on posts is over and beyond what I expected, and very much appreciated.

    I have read your examples and expanded notes, and I am now so much wiser to RegEx Expressions.

    Thank you for taking out the time to help out a stranger in another part of the world.

    Thank you. :)

  • Almost 2 years ago on 10 March 2018, I asked the community for help on using EXPRESSION mode.
    Within 3 days I got replies from several people.
    I documented this page to remind me of the ANSWERS and the MEMBERS who replied.
    It’s now nearly two years later, and I consider myself a PRO with the amount of work I achieved.
    The amount of work which would normally take me a year to complete, was reduced to just a few seconds.
    From the little you taught me that day in March 2018, opened more doors to more wisdom using the EXPRESSION mode, and I have achieved even far greater things in the last two years.
    If it were not for the great outpouring of help from volunteers like the people who helped me on my way in March 2018, I would still be living in the dark ages labouring away, the long way.
    Thank you! You have no concept of what I achieved by the little SEED of wisdom you planted in me that day.

Log in to reply