Community
    • Login

    remove a word in a block

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    remove a word ia block
    9 Posts 2 Posters 4.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Pouemes44P
      Pouemes44
      last edited by

      hello
      i would to remove a word inside two tags
      by example “friends” in this example
      <tag>hello all my friends</tag>
      <tag>your friends are kind</tag
      how? thanks

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hi, @pouemes44,

        Not very easy to find a correct regex ;-)) In addition, I preferred to consider the general case with several words friends, inside a single block <tag>.........</tag>

        So, starting with this example, below, in an unique line :

        Jim said : "Here come my friends" <tag>Hello, all my friends ! You are all kind, my friends</tag>  All the friends smiled and its friends answered :<tag>"No, we're just happy to be your friend !". Then, its friends gave a toast to him</tag> After a while, Jim said : "Sorry, my friends, I must go, by now !
        

        If the search regex is (?s)\x20friends?(?=((?!<tag>).)*</tag>), it would find a space character, followed by, either, the word friend OR friends, ONLY IF they are embedded in a <tag>…</tag>

        And, with a empty replacement zone, it would delete any word friend and friends with its leading Space character

        So, we get the modified text below :

        Jim said : "Here come my friends" <tag>Hello, all my ! You are all kind, my</tag>  All the friends smiled and its friends answered :<tag>"No, we're just happy to be your !". Then, its gave a toast to him</tag> After a while, Jim said : "Sorry, my friends, I must go, by now !
        

        Moreover, even if the example text is split on several lines, as below, this regex does the job, too !

        Jim said : "Here come my friends"
        <tag>Hello, all my friends ! You are all kind, my friends</tag>
        All the friends smiled and its friends answered :
        <tag>"No, we're just happy to be your friend !". Then, its friends gave a toast to him</tag>
        After a while, Jim said : "Sorry, my friends, I must go, by now !
        

        And, also, if the the blocks <tag>.........</tag> are split, themselves, in several lines, as below !

        Jim said : "Here come my friends"
        <tag>Hello, all my friends !
        You are all kind, my friends</tag>
        All the friends smiled and its friends answered :<tag>"No, we're just happy to be your friend !".
        Then, its friends gave a toast to him</tag>
        After a while, Jim said : "Sorry, my friends, I must go, by now !
        

        Notes :

        • As usual, the (?s) modifier means that the dot meta-character will match any single character ( standard and EOL ones )

        • Then the part \x20friends? simply looks for a space character, followed by the word friend or friends

        • To easily understand the final part (?=((?!<tag>).)*</tag>), it’s better to speak, first, about the more simple regex (?=.*</tag>) which is a look-ahead. In other words, a condition which must be verified to get an overall match. This condition says that the word friends? must be followed, further on, by the most far string </tag>

        • Now, it’s not difficult to notice that, between the word friend[s] and the last </tag> of the text, there are, probably, several juxtaposed blocks <tag>........</tag>. So an additional condition must be added : At any location, after an opening tag <tag>, till the nearest </tag>, an other opening tag <tag> should NOT be found !

        • Therefore, we changed the small part .* of the previous regex, standing for a range of characters, by the more restrictive regex ((?!<tag>).)*. And combining with the previous regex, we get our final regex (?s)\x20friends?(?=((?!<tag>).)*</tag>) !


        Important : My regex does NOT work in case of nested blocks, as <tag>.........<tag>........</tag>.........</tag>

        Cheers,

        guy038

        1 Reply Last reply Reply Quote 0
        • Pouemes44P
          Pouemes44
          last edited by

          hello guy
          a great thanks

          1:
          if i associe two words
          (?s)\x20love poem?(?=((?!<tag>).)*</tag>)
          it works but with the word i would poem it took also the <!doctype html> of my page so delete all

          2:
          is there a way to delete only poem, not poems or reverse?

          3
          i shall be happy to show you my project, but i dont dare to post a link here and i dont see how to post private messages

          still thanks

          1 Reply Last reply Reply Quote 0
          • Pouemes44P
            Pouemes44
            last edited by

            other problem
            with the word poem it took also poe of the word poetry

            1 Reply Last reply Reply Quote 0
            • Pouemes44P
              Pouemes44
              last edited by

              and sometimes take the <!doctype html> with two word but not always, i cannot undesrtand why

              1 Reply Last reply Reply Quote 0
              • guy038G
                guy038
                last edited by guy038

                Hi, @pouemes44,

                First of all, pouemess44, just go back to my previous searched expression \x20friends?.

                • The \x20 syntax represents a simple space character, of Unicode code-point \x{0020}. You could have used, instead, a classical space character !

                • At the end, the quantifier ? is, simply, a shortcut of the normal syntax {0,1}, meaning that the character, located before the ? must be present 0 till 1 times. In other words, the plural form s must be present or not, after the word friend

                Therefore, if you’re looking for the exact word “poem”, you must not add the question mark special character, at the end !
                Indeed, the regex poem? means that you want, either, the exact word “poem” OR “poe” ( letter m present or not ! )


                Now, to avoid that you get matches of your searched expression, when it’s embedded in a larger word, change the part \x20friends? into the regex \bfriends?\b. The \b syntax is an assertion ( a condition which must be respected ), which represents the zero-length location, between, either :

                • A Word character AND a Non-Word character

                • A Non-Word character AND a Word character

                Remember that the default Word characters range is any uppercase / lowercase letter, accentuated or not, any digit form OR the low line character, of code point \x{005f}

                So, the final regex should rather be : (?s)\bfriends?\b(?=((?!<tag>).)*</tag>)

                If you’re searching for some words, as the sentence “I love this poem”, just use the regex (?s)\bI love this poem\b(?=((?!<tag>).)*</tag>)

                And, if, in addition, you want an search, whatever the case of the letters, add the (?i) modifier, at the beginning of the regex, meaning that the regex search will be performed in a insensitive way

                So, to sum up, if you’re looking for, either, the exact sentence “I love this poem” OR the sentence “I love these poems”, whatever their case, between a starting tag <tag> and an ending tag </tag> only, whatever their location, use the regex, below :

                SEARCH (?si)\bI love th(is poem|ese poems)\b(?=((?!<tag>).)*</tag>)


                For a deeper investigation, and if your html file does not contain confidential information, you could send me your file ( or part of it), if you don’t mind, by mail, at my address :

                Cheers,

                guy038

                P.S.:

                For newby people, about regular expressions concept and syntax, begin with that article, in N++ Wiki :

                http://docs.notepad-plus-plus.org/index.php/Regular_Expressions

                In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :

                http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

                http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

                • The FIRST link explains the syntax, of regular expressions, in the SEARCH part

                • The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part


                You may, also, look for valuable informations, on the sites, below :

                http://www.regular-expressions.info

                http://www.rexegg.com

                http://perldoc.perl.org/perlre.html

                Be aware that, as any documentation, it may contain some errors ! Anyway, if you detected one, that’s good news : you’re improving ;-))

                1 Reply Last reply Reply Quote 0
                • Pouemes44P
                  Pouemes44
                  last edited by

                  a great thanks guy
                  with all your explanations i succeed to do things i could not imagine before
                  notepad++ with your explanations is really the tool i wanted
                  i am not so young and i have never learn programation but with these syntax explanations i succed to do ~ what i want

                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by guy038

                    Hi, @pouemes44,

                    If, despite of all the regex links, given above, you 're still stuck about creating a correct regex, just post your problem. You’re welcome ;-))

                    Of course, regular expressions are not as powerful as N++ Python or Lua scripts. But, you can’t imagine how tricky text changes can be done with them !!

                    Cheers,

                    guy038

                    1 Reply Last reply Reply Quote 0
                    • Pouemes44P
                      Pouemes44
                      last edited by

                      i think that i shall read the regex links before to try python :-)
                      still thanks guy, your way to copy a block was really great for me

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors