How to remove characters () and text inside

Drew

Hello Please tell me how to remove part of the text enclosed in brackets () together with these brackets INSIDE THE TAGS.
For axample, we have: <tag>Text 1 (Text 2)</tag>
I need to have:
<tag>Text 1</tag>

I found the expression \ (. * ) (Through regular expressions).
But this replace what happens in all document.
But i need to delete info (Text 2) inside the tag <tag>.
Thank you.

asvc

A bit of a brute-force approach, but will do the trick:

Split into 3 groups:
(\<\w*\>.*)$.*$(\<\/\w*\>)
Join two groups:
\1\2

Drew

@asvc said in How to remove characters () and text inside:

A bit of a brute-force approach, but will do the trick:

Split into 3 groups:
(<\w*>.)(.)(</\w*>)
Join two groups:
\1\2

Sorry, but I didn’t understand.
Should I find something like:
(<tag>.)(.)(</tag>)
And replace with:
<tag>\1\2</tag>

I didn’t understand what to write to find and replace.

PeterJones

@Drew said in How to remove characters () and text inside:

I didn’t understand what to write to find and replace.

Literally the text that @asvc wrote. Well, almost. The plaintext-highlighted string in #1 was intended as the FIND, and the plaintext-highlighted string in #2 was the REPLACE. What was implied, but not said, was that you had to use regular expression mode.

However, the FIND expression had a bug, in that \< does not mean a literal < in Notepad++'s regular expressions: it means “anchor to the beginning of a word”, and similarly for \> meaning “end of a word”, so it did not match your example text. As @guy038 has tried to teach me (and I occasionally remember), don’t over-escape your regex, either. A less-escaped regex which I tested on your example text is (<\w*>.*)$.*$(</\w*>) . With this regex, and the text

<tag>Text 1 (Text 2)</tag>
<another>Text 1 (Text 2)</another>
<tag>Text 1 (Text 2)</tag>

it will transform to

<tag>Text 1 </tag>
<another>Text 1 </another>
<tag>Text 1 </tag>

Note: @asvc made it extremely generic, in that it will match any tag – and my fix above will, too. If, instead, you really want it to only remove the (...) from a specific tag (we’ll assume tag), then use a FIND of (<tag>.*)$.*$(</tag>), and the same REPLACE from #2. With this regex and the example text I showed, it will transform to

<tag>Text 1 </tag>
<another>Text 1 (Text 2)</another>
<tag>Text 1 </tag>

showing that it only changed the contents of <tag>, rather than any tag.

If you example data is not representative of your actual data, then these regular expressions will likely not work from you. Regexes often need to be changed depending on the context of the other nearby characters

asvc

@PeterJones said in How to remove characters () and text inside:

< does not mean a literal < in Notepad++'s regular expressions

That is interesting! I was under impression it is a more or less standard PCRE, however reading the link you have provided:

\< ⇒ This matches the start of a word using Scintilla’s definitions of words.
\> ⇒ This matches the end of a word using Scintilla’s definition of words.

Good to know.

Alan Kilborn

@asvc said in How to remove characters () and text inside:

Good to know.

Yes, but be aware that there may be a bug with it in certain circumstances.
See HERE.
My extra notes on that indicate the bug is in the N++ regex engine and is not specific to Pythonscript (which is maybe what the link leads one to believe).
Anyway, if you choose to use \< and \>, YMMV. :-)

Drew

@PeterJones

Thank you so much! Yes it works.
I chose to find: (<tag>.*)$.*$(</tag>)
And replace with: \1\2 or $1$2
Checkmark regular expressions