Need to find string if don't have any tags

ganesan govindarajan

I need to find string/para if don’t have any tags in my xml file.

For example:
<para>my text</para>
My text
<para>my text</para>…

I wanted to find that bolded word using regex,.

Is there a way to get that regex?

Thanks
Ganesan G

guy038

Hello, @ganesan-govindarajan and All,

In order to get the text My text, with that exact case, when not surrounded with, both, the <para> and </para> tags, use the regex :

SEARCH (?-i)(?<!<para>)My text(?!</para>)

Note that if your search must be insensitive to case, change the leading modifier (?-i) syntax as (?i)

Now, if you want to get the text My text, with that exact case, when not surrounded with, either, the <para> and </para> tags, or none, use, either, the regex :

SEARCH (?-i)(?<!<para>)(My text)|(?1)(?!</para>)
SEARCH (?-i)<para>My text</para>(*SKIP)(*F)|My text

Finally, if your goal is to correct all the possible wrong syntaxes, use the following regex S/R :

SEARCH (?-i)(<para>)?My text(</para>)?
REPLACE (?1:<para>)$0?2:</para>

Of course, select the Regular expression search mode and tick, if necessary, the Wrap around option

Test this S/R against this sample :

<para>My text</para>
My text
<para>My text
My text</para>

After replacement, you should obtain the expected result :

<para>My text</para>
<para>My text</para>
<para>My text</para>
<para>My text</para>

Best Regards

guy038

ganesan govindarajan

Hi @guy038

Thanks for the help!!.

Sorry here “My text” is only for example. My intention is to find any sentence like “This is a Notepad++ regex…” etc without any open and end tags. Since, rest of the xml file may have open and end tags which i can easily identify using open and end tags.

Thanks
Ganesan. G

guy038

Hi, @ganesan-govindarajan and All,

Ah…OK. So, whatever the contents of tags, isn’t it ?

Then the following generic regex should work nice !

SEARCH (?-i)<(\w+)>(?2)</\1>(*SKIP)(*F)|(\QWhatever you want\E)

Note that the part between the \Q ( for Quote ) and \E ( for End ) is just considered as a literal range of characters !

So, in case of a very simple text to search as, for instance, My text the \Q and \E syntaxes are not necessary and you may use this practical regex :

SEARCH (?-i)<(\w+)>(?2)</\1>(*SKIP)(*F)|(My text)

When tested against the text, below :

01 <para>My text</para>
02 <blockquote>My text         <!--    MISSING tag     -->
03 <abc>My text</xyz>          <!-- NON-regular syntax -->
04 My text                     <!--    MISSING tags    -->
05 <ganesan>My text</ganesan>
06 <123>My text<456>           <!-- NON-regular syntax -->
07 My text</blockquote>        <!--    MISSING tags    -->
08 <h1>My text</h1>
09 (toto)My text(/toto)        <!-- NON-regular syntax -->
10 (Test)My text[/test]        <!-- NON-regular syntax -->

it would match the string My text, only in case of non-regular syntax or missing tag. So, in lines 02, 03, 04, 06, 07, 09 and 10 !

Similarly, if you’re looking for wrong syntaxes of the This is a Notepad++ regex. sentence, it’s better to use the syntax, below, as the text, to search for, contains the + and the . signs, which are regex symbols with a special meaning :

SEARCH (?-i)<(\w+)>(?2)</\1>(*SKIP)(*F)|(\QThis is a Notepad++ regex.\E)

Test it against this similar sample :

01 <para>This is a Notepad++ regex.</para>
02 <blockquote>This is a Notepad++ regex.         <!--    MISSING tag     -->
03 <abc>This is a Notepad++ regex.</xyz>          <!-- NON-regular syntax -->
04 This is a Notepad++ regex.                     <!--    MISSING tags    -->
05 <ganesan>This is a Notepad++ regex.</ganesan>
06 <123>This is a Notepad++ regex.<456>           <!-- NON-regular syntax -->
07 This is a Notepad++ regex.</blockquote>        <!--    MISSING tags    -->
08 <h1>This is a Notepad++ regex.</h1>
09 (toto)This is a Notepad++ regex.(/toto)        <!-- NON-regular syntax -->
10 (Test)This is a Notepad++ regex.[/test]        <!-- NON-regular syntax -->

Best Regards,

guy038