How to delete a complete tag with specific content
-
Unfortunately, Claudia’s regex can grab too much sometimes. For example, if run on the following data, it will select the first 3 lines with the first Find Next press (good!), but the second Find Next press will match the 6 lines after that (BAD! as that contains a non-zero “Active” field and a replace with “nothing” operation would delete that!):
<Report> <Active>0000000000</Active> </Report> <Report> <Active>0000000001</Active> </Report> <Report> <Active>0000000000</Active> </Report>
Try this instead:
Pre-step: Back up your data!
Find what zone:(?s-i)<Report>.*?<Active>(?:(0000000000)|(\d+))</Active>.*?</Report>\R
Replace with zone:(?1:$0\r\n
Search mode: Regular expression
Wrap around checkbox: Ticked
Action: Press the Replace All button -
Still the same.
Now a lager sample<Report> <UserName>Angebot </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro_gue.umb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000338</IndexPerm><TempIndex>0000000338</TempIndex></Report> <Report> <UserName>Angebot.</UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro_gue.umb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000339</IndexPerm><TempIndex>0000000339</TempIndex></Report> <Report> <UserName>Angebot </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro_gue.umb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000340</IndexPerm><TempIndex>0000000340</TempIndex></Report> <Report> <UserName>Angebot </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro_gue.umb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000341</IndexPerm><TempIndex>0000000341</TempIndex></Report> <Report> <UserName>Angebot </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro_gue.umb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000342</IndexPerm><TempIndex>0000000342</TempIndex></Report> <Report> <UserName>Angebot </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro_gue.umb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000343</IndexPerm><TempIndex>0000000343</TempIndex></Report> <Report> <UserName>Angebot </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro_gue.umb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000344</IndexPerm><TempIndex>0000000344</TempIndex></Report> <Report> <UserName>Angebot </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro_gue.umb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000345</IndexPerm><TempIndex>0000000345</TempIndex></Report> <Report> <UserName>Artikel mit hinterlegten Freifeldern </UserName> <RefReport>Drucker</RefReport> <Filter>Artikelliste</Filter> <VMB>artlist12.vmb</VMB> <Active>0000000001</Active> <ChangeTarget>Nein</ChangeTarget> <UserName filter="handwerk hw_plus">Material mit hinterlegten Freifeldern 1-6</UserName><IndexPerm>0000000346</IndexPerm><TempIndex>0000000346</TempIndex></Report> <Report> <UserName>Kunden mit hinterlegten Freifeldern </UserName> <UserName filter="Lbb Stb">UStIdNr-Anfragen Mandanten</UserName> <RefReport>Drucker</RefReport> <Filter>Kundenliste</Filter> <VMB>KundFrei1.vmb</VMB> <ChangeTarget>Nein</ChangeTarget> <Remark/> <Argument1/> <Argument2/> <UseProgram>Ja</UseProgram> <Active>0000000001</Active> <Group>Kundenliste</Group> <CommandID>0x800B</CommandID> <PreviewBmp>KunUstID.png</PreviewBmp> <IndexPerm>0000000347</IndexPerm><TempIndex>0000000347</TempIndex></Report> <Report> <UserName>Lieferanten mit hinterlegten Freifeldern 1-6</UserName> <RefReport>Drucker</RefReport> <Filter>Lieferantenliste</Filter> <VMB>liefFrei1.vmb</VMB> <Remark></Remark> <Argument1/> <Argument2/> <UseProgram>Ja</UseProgram> <Active>0000000001</Active> <ChangeTarget>Nein</ChangeTarget> <Group>Lieferantenliste</Group> <CommandID>0x8038</CommandID> <PreviewBmp>KunUstID.png</PreviewBmp> <IndexPerm>0000000348</IndexPerm><TempIndex>0000000348</TempIndex></Report> <Report> <UserName>Angebot </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro_gue.umb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000349</IndexPerm><TempIndex>0000000349</TempIndex></Report> <Report> <UserName>Rechnung </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro.vmb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000001</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000350</IndexPerm><TempIndex>0000000350</TempIndex></Report> <Report> <UserName>Angebot </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro_gue.umb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000351</IndexPerm><TempIndex>0000000351</TempIndex></Report> <Report> <UserName>Angebot </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro_gue.umb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000352</IndexPerm><TempIndex>0000000352</TempIndex></Report> <Report> <UserName>Angebot </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro_gue.umb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000353</IndexPerm><TempIndex>0000000353</TempIndex></Report> <Report> <UserName>Angebot </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro_gue.umb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000354</IndexPerm><TempIndex>0000000354</TempIndex></Report> <Report> <UserName>Angebot </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro_gue.umb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000355</IndexPerm><TempIndex>0000000355</TempIndex></Report> <Report> <UserName>Auftragsbestätigunglogo </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro.vmb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000001</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000388</IndexPerm><TempIndex>0000000388</TempIndex><Group></Group><Remark></Remark><Argument2></Argument2><UseProgram>Ja</UseProgram></Report> <Report> <UserName>Gutschrift </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro.vmb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000358</IndexPerm><TempIndex>0000000358</TempIndex><Group></Group><Remark></Remark><Argument2></Argument2><UseProgram>Ja</UseProgram></Report> <Report> <UserName>Gutschrift </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro.vmb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000001</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000357</IndexPerm><TempIndex>0000000357</TempIndex><Group></Group><Remark></Remark><Argument2></Argument2><UseProgram>Ja</UseProgram></Report> <Report> <UserName>Gutschrift old</UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro.vmb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000356</IndexPerm><TempIndex>0000000356</TempIndex><Group></Group><Remark></Remark><Argument2></Argument2><UseProgram>Ja</UseProgram></Report> <Report> <UserName>Gutschrift </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro.vmb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000360</IndexPerm><TempIndex>0000000360</TempIndex><Group></Group><Remark></Remark><Argument2></Argument2><UseProgram>Ja</UseProgram></Report> <Report> <UserName>eigene Bestellungen</UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro_gue.umb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000001</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000359</IndexPerm><TempIndex>0000000359</TempIndex></Report> <Report> <UserName>Angebot </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro_gue.umb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000361</IndexPerm><TempIndex>0000000361</TempIndex></Report> <Report> <UserName>Angebot </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro_gue.umb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000362</IndexPerm><TempIndex>0000000362</TempIndex></Report> <Report> <UserName>Angebot </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro_gue.umb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000363</IndexPerm><TempIndex>0000000363</TempIndex></Report> <Report> <UserName>Angebot </UserName> <RefReport>Drucker</RefReport> <Filter>netto Auftrag brutto layout</Filter> <VMB>lay_pro_gue.umb</VMB> <Argument1>layout.ini</Argument1> <Active>0000000000</Active> <ChangeTarget>Nein</ChangeTarget> <IndexPerm>0000000364</IndexPerm><TempIndex>0000000364</TempIndex></Report>
-
Well, trying my solution on that new data works for me!
-
I see my match get’s expanded until a report tag happens with action 000000.
Scotts version is working for me as well.Cheers
Claudia -
@Christoph-Kahle , @Scott-Sumner
find what:
(?s)<Report>(?:(?!</Report>).)*<Active>0000000000</Active>.*?</Report>
replace with empty
This seems to work as well :-D
Cheers
Claudia -
So some explanation is probably in order.
The Find what will match EVERY
<Report>
thru</Report>
(with line-ending after the closing tag) one-by-one as it moves thru the file. The difference is that when an all-zeroes Active field is encountered, it gets saved into “group #1”.The Replace with is where the difference comes in: If “group #1” was matched during the Find, the replacement text is “nothing” (thus a delete operation), otherwise it is the entire match (represented by
$0
) from the find stage.The
?1
is a “test” at replacement time. It’s general form is?1a:b
and means "if group #1 was matched at find time, replace with “a” at replace time, otherwise replace with “b”. In our specific case above “a” is omitted, which means “insert nothing here”. -
Hello, @christoph-kahle, @claudia-frank, @scott-sumner and All,
Now, most members of N++ community, who use regexes, in search/replacement operations, know the main difference between a lazy and a greedy quantifier ! For instance, given the
*
quantifier, which is a shortcut of the syntax{0,}
:-
The
(?-s)0.*9
regex matches the greatest range of characters, between digits0
and9
of a same line -
The
(?-s)0.*?9
regex matches the tallest range of characters, between digits0
and9
of a same line -
The
(?s)0.*9
regex matches the greatest range of characters, between digits0
and9
, even of several lines -
The
(?s)0.*?9
regex matches the tallest range of characters, between digits0
and9
, even of several lines
So, assuming the one-line text, below :
1234567890<Report>First Block</Report>123457890<Report>Second Block</Report>1234567890<Report>Third Block</Report>1234567890
You should, easily, see the difference between the regex
(?-s)<Report>.*</Report>
and the regex(?-s)<Report>.*?</Report>
Just note that the later regex could be rewritten, as well,
(?-s)<Report>((?!</Report>).)*</Report>
. Indeed, this regex means :- Find a zone, beginning with
<Report>
, ending with</Report>
and which does not contain, at any position, after<Report>
, till</Report>
, the string</Report>
!
Now, why the first form the the Claudia’s regex, below, is not totally correct ?
(?s)<Report>.*?<Active>0000000000</Active>.*?</Report>
I, probably, would have build this one, at first sight, too :-) Indeed, this regex looks for the nearest string
<Active>0000000000</Active>
, after the start tag<Report>
, itself followed by the nearest end tag</Report>
. Everything seems OK…However, we do a mistake because the fact of reaching the nearest
<Active>0000000000</Active>
and the fact of reaching the nearest</Report>
are independent events ! And we may find, first, a<Active>0000000000</Active>
block, after crossing many end tags</Reports
:-((So, in order to tell the regex engine to find, first, the nearest string
<Active>0000000000</Active>
, of the SAME block<Report>.....<Report>
, we must use the second Claudia’s regex, below :(?s)<Report>(?:(?!</Report>).)*<Active>0000000000</Active>.*?</Report>
Notes :
-
No need to change the first
*
greedy quantifier, before<Active>
, in its lazy form ! Actually, due to the negative look-ahead, you already know that the range, between<Report>
and<Active>...
, is part of the current<Report>...</Report>
zone, which always contains, in Christoph’s file, an unique block<Active>.....</Active>
! -
On the contrary, the last
*?
lazy modifier, before</Report>
, is, of course, mandatory, to get the end of the current<Report>
tag
Cheers,
guy038
-
-
@guy038 said:
Find a zone, beginning with <Report>, ending with </Report> and which does not contain, at any position, after <Report>, till </Report>, the string </Report> !
Hi Guy! I’m sure you can appreciate the humor in your statement. Of course the string
</Report>
isn’t going to occur before the string</Report>
! Because if it did, then it would have! ;) -
Hi, @scott-sumner,
Yes, my statement looks a bit weird ! But, English is not my mother language, and probably, a better formulation exists, in fluent English-American :-)
Oh !, Perhaps, I should had write :
- Find a zone, beginning with
<Report>
, ending with</Report>
and which does not contain, at any position, after<Report>
, till</Report>
, AN OTHER string</Report>
?
Actually, I just wanted to point out, that the simple case
(?-s)a.*?b
, with a lazy quantifier, may, as well, be written, with a greedy quantifier :-
(?-s)a((?!b).)*b
-
a[^b\r\n]*b
Cheers,
guy038
- Find a zone, beginning with
-
You’ve got to admit, even for native speakers, it’s sometimes hard to translate a regular expression into clear English. I would probably phrase the non-greedy version as
Find a zone, beginning with
<Report>
and ending with the first instance of</Report>
and the greedy version as
Find a zone, beginning with
<Report>
and ending with the last instance of</Report>
-
@guy038 said in How to delete a complete tag with specific content:
(?s)<Report>(?:(?!</Report>).)<Active>0000000000</Active>.?</Report>
Thank you. It works for me in a similar case.