  • I need to find string/para if don’t have any tags in my xml file.

    For example:
    <para>my text</para>
    My text
    <para>my text</para>…

    I wanted to find that bolded word using regex,.

    Is there a way to get that regex?

    Ganesan G

  • Hello, @ganesan-govindarajan and All,

    In order to get the text My text, with that exact case, when not surrounded with, both, the <para> and </para> tags, use the regex :

    SEARCH (?-i)(?<!<para>)My text(?!</para>)

    Note that if your search must be insensitive to case, change the leading modifier (?-i) syntax as (?i)

    Now, if you want to get the text My text, with that exact case, when not surrounded with, either, the <para> and </para> tags, or none, use, either, the regex :

    • SEARCH (?-i)(?<!<para>)(My text)|(?1)(?!</para>)

    • SEARCH (?-i)<para>My text</para>(*SKIP)(*F)|My text

    Finally, if your goal is to correct all the possible wrong syntaxes, use the following regex S/R :

    • SEARCH (?-i)(<para>)?My text(</para>)?

    • REPLACE (?1:<para>)$0?2:</para>

    Of course, select the Regular expression search mode and tick, if necessary, the Wrap around option

    Test this S/R against this sample :

    <para>My text</para>
    My text
    <para>My text
    My text</para>

    After replacement, you should obtain the expected result :

    <para>My text</para>
    <para>My text</para>
    <para>My text</para>
    <para>My text</para>

    


  • Hi @guy038

    Thanks for the help!!.

    Sorry here “My text” is only for example. My intention is to find any sentence like “This is a Notepad++ regex…” etc without any open and end tags. Since, rest of the xml file may have open and end tags which i can easily identify using open and end tags.

    Ganesan. G

  • Hi, @ganesan-govindarajan and All,

    Ah…OK. So, whatever the contents of tags, isn’t it ?

    Then the following generic regex should work nice !

    • SEARCH (?-i)<(\w+)>(?2)</\1>(*SKIP)(*F)|(\QWhatever you want\E)

    Note that the part between the \Q ( for Quote ) and \E ( for End ) is just considered as a literal range of characters !

    So, in case of a very simple text to search as, for instance, My text the \Q and \E syntaxes are not necessary and you may use this practical regex :

    • SEARCH (?-i)<(\w+)>(?2)</\1>(*SKIP)(*F)|(My text)

    When tested against the text, below :

    01 <para>My text</para>
    02 <blockquote>My text         <!--    MISSING tag     -->
    03 <abc>My text</xyz>          <!-- NON-regular syntax -->
    04 My text                     <!--    MISSING tags    -->
    05 <ganesan>My text</ganesan>
    06 <123>My text<456>           <!-- NON-regular syntax -->
    07 My text</blockquote>        <!--    MISSING tags    -->
    08 <h1>My text</h1>
    09 (toto)My text(/toto)        <!-- NON-regular syntax -->
    10 (Test)My text[/test]        <!-- NON-regular syntax -->

    it would match the string My text, only in case of non-regular syntax or missing tag. So, in lines 02, 03, 04, 06, 07, 09 and 10 !

    Similarly, if you’re looking for wrong syntaxes of the This is a Notepad++ regex. sentence, it’s better to use the syntax, below, as the text, to search for, contains the + and the . signs, which are regex symbols with a special meaning :

    • SEARCH (?-i)<(\w+)>(?2)</\1>(*SKIP)(*F)|(\QThis is a Notepad++ regex.\E)

    Test it against this similar sample :

    01 <para>This is a Notepad++ regex.</para>
    02 <blockquote>This is a Notepad++ regex.         <!--    MISSING tag     -->
    03 <abc>This is a Notepad++ regex.</xyz>          <!-- NON-regular syntax -->
    04 This is a Notepad++ regex.                     <!--    MISSING tags    -->
    05 <ganesan>This is a Notepad++ regex.</ganesan>
    06 <123>This is a Notepad++ regex.<456>           <!-- NON-regular syntax -->
    07 This is a Notepad++ regex.</blockquote>        <!--    MISSING tags    -->
    08 <h1>This is a Notepad++ regex.</h1>
    09 (toto)This is a Notepad++ regex.(/toto)        <!-- NON-regular syntax -->
    10 (Test)This is a Notepad++ regex.[/test]        <!-- NON-regular syntax -->

    


