Regex to delete sections of XML
-
How can I use regex to find/delete all XML strings/units that contain (approved=“yes”)
I have tried this and is find the sections that have Yes but becomes greedy when it doesn’t
<trans.?(approved=“yes”).?unit>?\nFind
<trans-unit id=“1” identifier=“e4c7” approved=“yes”>
<source>Hello world
</trans-unit>Ignore
<trans-unit id=“5” identifier=“e4c7” approved=“no”>
<source>Welcome to the world
</trans-unit> -
@Christopher-Phillips
This should help.Find what:
^(<trans.+?approved=)(?=“yes”)(?s).+?trans-unit>\R
Replace With:nothing in this field<— field emptyThis will find those occurrences with the parameter “yes” and with the replace field set to ‘blank’ it will remove those occurrences. Note the last character is a carriage return/line feed, just so you don’t finish up with extra blank lines afterwards. Also note I started the regex with a
^, meaning start of line, I assume these XML strings will ALL start at the beginning of a line.Also note the
"characters need to be exactly as you have it. If it doesn’t work initially, change these in my regex to be the same as your XML strings.Terry
-
Hello, @christopher-phillips, and All,
To delete all the
<trans-unit......>.........</trans-unit>areas, just execute this regex S/R :SEARCH
(?s)<(trans-unit)\x20((?!<\1).)+?approved="yes".+?</\1>\RREPLACE
Leave EMPTY-
Check preferably the
Wrap aroundoption -
Select the
Regular expressionsearch mode -
Click, once, on the
Replace Allbutton
Et voilà !
Test it against the text below :
<trans-unit id="1" identifier="e4c7" approved="yes"> <source>Hello world </trans-unit> <trans-unit id="5" identifier="e4c7" approved="no"> <source>Welcome to the world </trans-unit> <trans-unit id="1" identifier="e4c7" approved="yes"> <source>Hello world </trans-unit> <trans-unit id="5" identifier="e4c7" approved="no"> <source>Welcome to the world </trans-unit> <trans-unit id="1" identifier="e4c7" approved="yes"> <source>Hello world </trans-unit> <trans-unit id="5" identifier="e4c7" approved="no"> <source>Welcome to the world </trans-unit>
Notes :
-
This search regex uses the usual Quotation Mark symbol
"(\x{0022}) and not the Left and Right Double Quotation Mark“and”(\x{201C}and\x{201D}). Change the double quotes, if necessary ! -
The first part
(?s)means that dot will match any single char ( standard or EOL chars ) -
Then, the regex looks for the
<symbol, followed with the string trans-unit, stored as group1, because of the parentheses, followed with a space char (<(<trans-unit)\x20) -
After the part
((?!<\1).)+?approved="yes"tries to find the smallest range, even multi-lines, of any character till the stringapproved="yes"ONLY IF the string<trans-unitcannot be found at any position of that range -
Finally, the part
.+?</\1>\Rtries to match the smallest range, even multi-lines, of any character till the string</trans-unit>, followed with the usual EOL characters of current line
Best Regards,
guy038
-
-
@guy038 said:
Thank you both.
guy038, yours was the one that sorted it for me. Thanks -
@guy038 If I have many <trans-unit> <trans-unit> without approved=“yes” it is like Notepad++ can’t handle it and selects the whole file from start to finish as if I had pressed Ctrl+a
Seems there is an issues with grouping from what I could find
https://github.com/notepad-plus-plus/notepad-plus-plus/issues/683I am not sure what grouping is but can you think of a workaround?
-
Hi, @christopher-phillips, @terry-r and All,
Indeed, in some cases, the N++ regex engine wrongly matches all the file contents ! I have not been able to find out, so far, which condition(s) cause(s) this issue :-((
But if all your ranges of characters
<trans-unit...........approved="yes/no"lie in a single line only, the more simple regex, below, without the negative look-ahead structure, should work better :(?-s)^\h*<(trans-unit)\x20.+approved="yes"((?s).+?)</\1>\RCheers,
guy038
-
Mine are note on the same line. In some cases there are line breaks within the element as well :-(
<trans-unit id="8" identifier="b2a7b029bf7d20000a606ec7a87bc248"> <source>The old password is not right</source> <target state="needs-translation">The old password is not right</target> <note>Context: -> The old password is not correct</note> </trans-unit> <trans-unit id="9" identifier="d0d863d18d76100000ad54f79a2eed11"> <source>No account found</source> <target state="needs-translation">No account found</target> <note>No account</note> </trans-unit> <trans-unit id="11" identifier="bd421d33a9b0000e1e46049b1273eb9" approved="yes"> <source>Cannot get questions</source> <target>ಪ್ರಶ್ನೆಗಳನ್ನು ಪಡೆಯಲು</target> <note>Context: #error</note> </trans-unit> -
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login