remove a word in a block

Pouemes44

hello
i would to remove a word inside two tags
by example “friends” in this example
<tag>hello all my friends</tag>
<tag>your friends are kind</tag
how? thanks

guy038

Hi, @pouemes44,

Not very easy to find a correct regex ;-)) In addition, I preferred to consider the general case with several words friends, inside a single block <tag>.........</tag>

So, starting with this example, below, in an unique line :

Jim said : "Here come my friends" <tag>Hello, all my friends ! You are all kind, my friends</tag>  All the friends smiled and its friends answered :<tag>"No, we're just happy to be your friend !". Then, its friends gave a toast to him</tag> After a while, Jim said : "Sorry, my friends, I must go, by now !

If the search regex is (?s)\x20friends?(?=((?!<tag>).)*</tag>), it would find a space character, followed by, either, the word friend OR friends, ONLY IF they are embedded in a <tag>…</tag>

And, with a empty replacement zone, it would delete any word friend and friends with its leading Space character

So, we get the modified text below :

Jim said : "Here come my friends" <tag>Hello, all my ! You are all kind, my</tag>  All the friends smiled and its friends answered :<tag>"No, we're just happy to be your !". Then, its gave a toast to him</tag> After a while, Jim said : "Sorry, my friends, I must go, by now !

Moreover, even if the example text is split on several lines, as below, this regex does the job, too !

Jim said : "Here come my friends"
<tag>Hello, all my friends ! You are all kind, my friends</tag>
All the friends smiled and its friends answered :
<tag>"No, we're just happy to be your friend !". Then, its friends gave a toast to him</tag>
After a while, Jim said : "Sorry, my friends, I must go, by now !

And, also, if the the blocks <tag>.........</tag> are split, themselves, in several lines, as below !

Jim said : "Here come my friends"
<tag>Hello, all my friends !
You are all kind, my friends</tag>
All the friends smiled and its friends answered :<tag>"No, we're just happy to be your friend !".
Then, its friends gave a toast to him</tag>
After a while, Jim said : "Sorry, my friends, I must go, by now !

Notes :

As usual, the (?s) modifier means that the dot meta-character will match any single character ( standard and EOL ones )
Then the part \x20friends? simply looks for a space character, followed by the word friend or friends
To easily understand the final part (?=((?!<tag>).)*</tag>), it’s better to speak, first, about the more simple regex (?=.*</tag>) which is a look-ahead. In other words, a condition which must be verified to get an overall match. This condition says that the word friends? must be followed, further on, by the most far string </tag>
Now, it’s not difficult to notice that, between the word friend[s] and the last </tag> of the text, there are, probably, several juxtaposed blocks <tag>........</tag>. So an additional condition must be added : At any location, after an opening tag <tag>, till the nearest </tag>, an other opening tag <tag> should NOT be found !
Therefore, we changed the small part .* of the previous regex, standing for a range of characters, by the more restrictive regex ((?!<tag>).)*. And combining with the previous regex, we get our final regex (?s)\x20friends?(?=((?!<tag>).)*</tag>) !

Important : My regex does NOT work in case of nested blocks, as <tag>.........<tag>........</tag>.........</tag>

Cheers,

guy038

Pouemes44

hello guy
a great thanks

1:
if i associe two words
(?s)\x20love poem?(?=((?!<tag>).)*</tag>)
it works but with the word i would poem it took also the <!doctype html> of my page so delete all

2:
is there a way to delete only poem, not poems or reverse?

3
i shall be happy to show you my project, but i dont dare to post a link here and i dont see how to post private messages

still thanks

Pouemes44

other problem
with the word poem it took also poe of the word poetry

Pouemes44

and sometimes take the <!doctype html> with two word but not always, i cannot undesrtand why

guy038

Hi, @pouemes44,

First of all, pouemess44, just go back to my previous searched expression \x20friends?.

The \x20 syntax represents a simple space character, of Unicode code-point \x{0020}. You could have used, instead, a classical space character !
At the end, the quantifier ? is, simply, a shortcut of the normal syntax {0,1}, meaning that the character, located before the ? must be present 0 till 1 times. In other words, the plural form s must be present or not, after the word friend

Therefore, if you’re looking for the exact word “poem”, you must not add the question mark special character, at the end !
Indeed, the regex poem? means that you want, either, the exact word “poem” OR “poe” ( letter m present or not ! )

Now, to avoid that you get matches of your searched expression, when it’s embedded in a larger word, change the part \x20friends? into the regex \bfriends?\b. The \b syntax is an assertion ( a condition which must be respected ), which represents the zero-length location, between, either :

A Word character AND a Non-Word character
A Non-Word character AND a Word character

Remember that the default Word characters range is any uppercase / lowercase letter, accentuated or not, any digit form OR the low line character, of code point \x{005f}

So, the final regex should rather be : (?s)\bfriends?\b(?=((?!<tag>).)*</tag>)

If you’re searching for some words, as the sentence “I love this poem”, just use the regex (?s)\bI love this poem\b(?=((?!<tag>).)*</tag>)

And, if, in addition, you want an search, whatever the case of the letters, add the (?i) modifier, at the beginning of the regex, meaning that the regex search will be performed in a insensitive way

So, to sum up, if you’re looking for, either, the exact sentence “I love this poem” OR the sentence “I love these poems”, whatever their case, between a starting tag <tag> and an ending tag </tag> only, whatever their location, use the regex, below :

SEARCH (?si)\bI love th(is poem|ese poems)\b(?=((?!<tag>).)*</tag>)

For a deeper investigation, and if your html file does not contain confidential information, you could send me your file ( or part of it), if you don’t mind, by mail, at my address :

Cheers,

guy038

P.S.:

For newby people, about regular expressions concept and syntax, begin with that article, in N++ Wiki :

http://docs.notepad-plus-plus.org/index.php/Regular_Expressions

In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :

http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

The FIRST link explains the syntax, of regular expressions, in the SEARCH part
The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part

You may, also, look for valuable informations, on the sites, below :

http://www.regular-expressions.info

http://www.rexegg.com

http://perldoc.perl.org/perlre.html

Be aware that, as any documentation, it may contain some errors ! Anyway, if you detected one, that’s good news : you’re improving ;-))

Pouemes44

a great thanks guy
with all your explanations i succeed to do things i could not imagine before
notepad++ with your explanations is really the tool i wanted
i am not so young and i have never learn programation but with these syntax explanations i succed to do ~ what i want

guy038

Hi, @pouemes44,

If, despite of all the regex links, given above, you 're still stuck about creating a correct regex, just post your problem. You’re welcome ;-))

Of course, regular expressions are not as powerful as N++ Python or Lua scripts. But, you can’t imagine how tricky text changes can be done with them !!

Cheers,

guy038

Pouemes44

i think that i shall read the regex links before to try python :-)
still thanks guy, your way to copy a block was really great for me