remove a word in a block



  • hello
    i would to remove a word inside two tags
    by example “friends” in this example
    <tag>hello all my friends</tag>
    <tag>your friends are kind</tag
    how? thanks



  • Hi, @pouemes44,

    Not very easy to find a correct regex ;-)) In addition, I preferred to consider the general case with several words friends, inside a single block <tag>.........</tag>

    So, starting with this example, below, in an unique line :

    Jim said : "Here come my friends" <tag>Hello, all my friends ! You are all kind, my friends</tag>  All the friends smiled and its friends answered :<tag>"No, we're just happy to be your friend !". Then, its friends gave a toast to him</tag> After a while, Jim said : "Sorry, my friends, I must go, by now !
    

    If the search regex is (?s)\x20friends?(?=((?!<tag>).)*</tag>), it would find a space character, followed by, either, the word friend OR friends, ONLY IF they are embedded in a <tag>…</tag>

    And, with a empty replacement zone, it would delete any word friend and friends with its leading Space character

    So, we get the modified text below :

    Jim said : "Here come my friends" <tag>Hello, all my ! You are all kind, my</tag>  All the friends smiled and its friends answered :<tag>"No, we're just happy to be your !". Then, its gave a toast to him</tag> After a while, Jim said : "Sorry, my friends, I must go, by now !
    

    Moreover, even if the example text is split on several lines, as below, this regex does the job, too !

    Jim said : "Here come my friends"
    <tag>Hello, all my friends ! You are all kind, my friends</tag>
    All the friends smiled and its friends answered :
    <tag>"No, we're just happy to be your friend !". Then, its friends gave a toast to him</tag>
    After a while, Jim said : "Sorry, my friends, I must go, by now !
    

    And, also, if the the blocks <tag>.........</tag> are split, themselves, in several lines, as below !

    Jim said : "Here come my friends"
    <tag>Hello, all my friends !
    You are all kind, my friends</tag>
    All the friends smiled and its friends answered :<tag>"No, we're just happy to be your friend !".
    Then, its friends gave a toast to him</tag>
    After a while, Jim said : "Sorry, my friends, I must go, by now !
    

    Notes :

    • As usual, the (?s) modifier means that the dot meta-character will match any single character ( standard and EOL ones )

    • Then the part \x20friends? simply looks for a space character, followed by the word friend or friends

    • To easily understand the final part (?=((?!<tag>).)*</tag>), it’s better to speak, first, about the more simple regex (?=.*</tag>) which is a look-ahead. In other words, a condition which must be verified to get an overall match. This condition says that the word friends? must be followed, further on, by the most far string </tag>

    • Now, it’s not difficult to notice that, between the word friend[s] and the last </tag> of the text, there are, probably, several juxtaposed blocks <tag>........</tag>. So an additional condition must be added : At any location, after an opening tag <tag>, till the nearest </tag>, an other opening tag <tag> should NOT be found !

    • Therefore, we changed the small part .* of the previous regex, standing for a range of characters, by the more restrictive regex ((?!<tag>).)*. And combining with the previous regex, we get our final regex (?s)\x20friends?(?=((?!<tag>).)*</tag>) !


    Important : My regex does NOT work in case of nested blocks, as <tag>.........<tag>........</tag>.........</tag>

    Cheers,

    guy038



  • hello guy
    a great thanks

    1:
    if i associe two words
    (?s)\x20love poem?(?=((?!<tag>).)*</tag>)
    it works but with the word i would poem it took also the <!doctype html> of my page so delete all

    2:
    is there a way to delete only poem, not poems or reverse?

    3
    i shall be happy to show you my project, but i dont dare to post a link here and i dont see how to post private messages

    still thanks



  • other problem
    with the word poem it took also poe of the word poetry



  • and sometimes take the <!doctype html> with two word but not always, i cannot undesrtand why



  • Hi, @pouemes44,

    First of all, pouemess44, just go back to my previous searched expression \x20friends?.

    • The \x20 syntax represents a simple space character, of Unicode code-point \x{0020}. You could have used, instead, a classical space character !

    • At the end, the quantifier ? is, simply, a shortcut of the normal syntax {0,1}, meaning that the character, located before the ? must be present 0 till 1 times. In other words, the plural form s must be present or not, after the word friend

    Therefore, if you’re looking for the exact word “poem”, you must not add the question mark special character, at the end !
    Indeed, the regex poem? means that you want, either, the exact word “poem” OR “poe” ( letter m present or not ! )


    Now, to avoid that you get matches of your searched expression, when it’s embedded in a larger word, change the part \x20friends? into the regex \bfriends?\b. The \b syntax is an assertion ( a condition which must be respected ), which represents the zero-length location, between, either :

    • A Word character AND a Non-Word character

    • A Non-Word character AND a Word character

    Remember that the default Word characters range is any uppercase / lowercase letter, accentuated or not, any digit form OR the low line character, of code point \x{005f}

    So, the final regex should rather be : (?s)\bfriends?\b(?=((?!<tag>).)*</tag>)

    If you’re searching for some words, as the sentence “I love this poem”, just use the regex (?s)\bI love this poem\b(?=((?!<tag>).)*</tag>)

    And, if, in addition, you want an search, whatever the case of the letters, add the (?i) modifier, at the beginning of the regex, meaning that the regex search will be performed in a insensitive way

    So, to sum up, if you’re looking for, either, the exact sentence “I love this poem” OR the sentence “I love these poems”, whatever their case, between a starting tag <tag> and an ending tag </tag> only, whatever their location, use the regex, below :

    SEARCH (?si)\bI love th(is poem|ese poems)\b(?=((?!<tag>).)*</tag>)


    For a deeper investigation, and if your html file does not contain confidential information, you could send me your file ( or part of it), if you don’t mind, by mail, at my address :

    tguy.038@gmail.com

    Cheers,

    guy038

    P.S.:

    For newby people, about regular expressions concept and syntax, begin with that article, in N++ Wiki :

    http://docs.notepad-plus-plus.org/index.php/Regular_Expressions

    In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :

    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

    • The FIRST link explains the syntax, of regular expressions, in the SEARCH part

    • The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part


    You may, also, look for valuable informations, on the sites, below :

    http://www.regular-expressions.info

    http://www.rexegg.com

    http://perldoc.perl.org/perlre.html

    Be aware that, as any documentation, it may contain some errors ! Anyway, if you detected one, that’s good news : you’re improving ;-))



  • a great thanks guy
    with all your explanations i succeed to do things i could not imagine before
    notepad++ with your explanations is really the tool i wanted
    i am not so young and i have never learn programation but with these syntax explanations i succed to do ~ what i want



  • Hi, @pouemes44,

    If, despite of all the regex links, given above, you 're still stuck about creating a correct regex, just post your problem. You’re welcome ;-))

    Of course, regular expressions are not as powerful as N++ Python or Lua scripts. But, you can’t imagine how tricky text changes can be done with them !!

    Cheers,

    guy038



  • i think that i shall read the regex links before to try python :-)
    still thanks guy, your way to copy a block was really great for me


Log in to reply