Community
    • Login

    remove a word in a block

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    remove a word ia block
    9 Posts 2 Posters 4.5k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Pouemes44P Offline
      Pouemes44
      last edited by

      hello
      i would to remove a word inside two tags
      by example “friends” in this example
      <tag>hello all my friends</tag>
      <tag>your friends are kind</tag
      how? thanks

      1 Reply Last reply Reply Quote 0
      • guy038G Offline
        guy038
        last edited by guy038

        Hi, @pouemes44,

        Not very easy to find a correct regex ;-)) In addition, I preferred to consider the general case with several words friends, inside a single block <tag>.........</tag>

        So, starting with this example, below, in an unique line :

        Jim said : "Here come my friends" <tag>Hello, all my friends ! You are all kind, my friends</tag>  All the friends smiled and its friends answered :<tag>"No, we're just happy to be your friend !". Then, its friends gave a toast to him</tag> After a while, Jim said : "Sorry, my friends, I must go, by now !
        

        If the search regex is (?s)\x20friends?(?=((?!<tag>).)*</tag>), it would find a space character, followed by, either, the word friend OR friends, ONLY IF they are embedded in a <tag>…</tag>

        And, with a empty replacement zone, it would delete any word friend and friends with its leading Space character

        So, we get the modified text below :

        Jim said : "Here come my friends" <tag>Hello, all my ! You are all kind, my</tag>  All the friends smiled and its friends answered :<tag>"No, we're just happy to be your !". Then, its gave a toast to him</tag> After a while, Jim said : "Sorry, my friends, I must go, by now !
        

        Moreover, even if the example text is split on several lines, as below, this regex does the job, too !

        Jim said : "Here come my friends"
        <tag>Hello, all my friends ! You are all kind, my friends</tag>
        All the friends smiled and its friends answered :
        <tag>"No, we're just happy to be your friend !". Then, its friends gave a toast to him</tag>
        After a while, Jim said : "Sorry, my friends, I must go, by now !
        

        And, also, if the the blocks <tag>.........</tag> are split, themselves, in several lines, as below !

        Jim said : "Here come my friends"
        <tag>Hello, all my friends !
        You are all kind, my friends</tag>
        All the friends smiled and its friends answered :<tag>"No, we're just happy to be your friend !".
        Then, its friends gave a toast to him</tag>
        After a while, Jim said : "Sorry, my friends, I must go, by now !
        

        Notes :

        • As usual, the (?s) modifier means that the dot meta-character will match any single character ( standard and EOL ones )

        • Then the part \x20friends? simply looks for a space character, followed by the word friend or friends

        • To easily understand the final part (?=((?!<tag>).)*</tag>), it’s better to speak, first, about the more simple regex (?=.*</tag>) which is a look-ahead. In other words, a condition which must be verified to get an overall match. This condition says that the word friends? must be followed, further on, by the most far string </tag>

        • Now, it’s not difficult to notice that, between the word friend[s] and the last </tag> of the text, there are, probably, several juxtaposed blocks <tag>........</tag>. So an additional condition must be added : At any location, after an opening tag <tag>, till the nearest </tag>, an other opening tag <tag> should NOT be found !

        • Therefore, we changed the small part .* of the previous regex, standing for a range of characters, by the more restrictive regex ((?!<tag>).)*. And combining with the previous regex, we get our final regex (?s)\x20friends?(?=((?!<tag>).)*</tag>) !


        Important : My regex does NOT work in case of nested blocks, as <tag>.........<tag>........</tag>.........</tag>

        Cheers,

        guy038

        1 Reply Last reply Reply Quote 0
        • Pouemes44P Offline
          Pouemes44
          last edited by

          hello guy
          a great thanks

          1:
          if i associe two words
          (?s)\x20love poem?(?=((?!<tag>).)*</tag>)
          it works but with the word i would poem it took also the <!doctype html> of my page so delete all

          2:
          is there a way to delete only poem, not poems or reverse?

          3
          i shall be happy to show you my project, but i dont dare to post a link here and i dont see how to post private messages

          still thanks

          1 Reply Last reply Reply Quote 0
          • Pouemes44P Offline
            Pouemes44
            last edited by

            other problem
            with the word poem it took also poe of the word poetry

            1 Reply Last reply Reply Quote 0
            • Pouemes44P Offline
              Pouemes44
              last edited by

              and sometimes take the <!doctype html> with two word but not always, i cannot undesrtand why

              1 Reply Last reply Reply Quote 0
              • guy038G Offline
                guy038
                last edited by guy038

                Hi, @pouemes44,

                First of all, pouemess44, just go back to my previous searched expression \x20friends?.

                • The \x20 syntax represents a simple space character, of Unicode code-point \x{0020}. You could have used, instead, a classical space character !

                • At the end, the quantifier ? is, simply, a shortcut of the normal syntax {0,1}, meaning that the character, located before the ? must be present 0 till 1 times. In other words, the plural form s must be present or not, after the word friend

                Therefore, if you’re looking for the exact word “poem”, you must not add the question mark special character, at the end !
                Indeed, the regex poem? means that you want, either, the exact word “poem” OR “poe” ( letter m present or not ! )


                Now, to avoid that you get matches of your searched expression, when it’s embedded in a larger word, change the part \x20friends? into the regex \bfriends?\b. The \b syntax is an assertion ( a condition which must be respected ), which represents the zero-length location, between, either :

                • A Word character AND a Non-Word character

                • A Non-Word character AND a Word character

                Remember that the default Word characters range is any uppercase / lowercase letter, accentuated or not, any digit form OR the low line character, of code point \x{005f}

                So, the final regex should rather be : (?s)\bfriends?\b(?=((?!<tag>).)*</tag>)

                If you’re searching for some words, as the sentence “I love this poem”, just use the regex (?s)\bI love this poem\b(?=((?!<tag>).)*</tag>)

                And, if, in addition, you want an search, whatever the case of the letters, add the (?i) modifier, at the beginning of the regex, meaning that the regex search will be performed in a insensitive way

                So, to sum up, if you’re looking for, either, the exact sentence “I love this poem” OR the sentence “I love these poems”, whatever their case, between a starting tag <tag> and an ending tag </tag> only, whatever their location, use the regex, below :

                SEARCH (?si)\bI love th(is poem|ese poems)\b(?=((?!<tag>).)*</tag>)


                For a deeper investigation, and if your html file does not contain confidential information, you could send me your file ( or part of it), if you don’t mind, by mail, at my address :

                Cheers,

                guy038

                P.S.:

                For newby people, about regular expressions concept and syntax, begin with that article, in N++ Wiki :

                http://docs.notepad-plus-plus.org/index.php/Regular_Expressions

                In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :

                http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

                http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

                • The FIRST link explains the syntax, of regular expressions, in the SEARCH part

                • The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part


                You may, also, look for valuable informations, on the sites, below :

                http://www.regular-expressions.info

                http://www.rexegg.com

                http://perldoc.perl.org/perlre.html

                Be aware that, as any documentation, it may contain some errors ! Anyway, if you detected one, that’s good news : you’re improving ;-))

                1 Reply Last reply Reply Quote 0
                • Pouemes44P Offline
                  Pouemes44
                  last edited by

                  a great thanks guy
                  with all your explanations i succeed to do things i could not imagine before
                  notepad++ with your explanations is really the tool i wanted
                  i am not so young and i have never learn programation but with these syntax explanations i succed to do ~ what i want

                  1 Reply Last reply Reply Quote 0
                  • guy038G Offline
                    guy038
                    last edited by guy038

                    Hi, @pouemes44,

                    If, despite of all the regex links, given above, you 're still stuck about creating a correct regex, just post your problem. You’re welcome ;-))

                    Of course, regular expressions are not as powerful as N++ Python or Lua scripts. But, you can’t imagine how tricky text changes can be done with them !!

                    Cheers,

                    guy038

                    1 Reply Last reply Reply Quote 0
                    • Pouemes44P Offline
                      Pouemes44
                      last edited by

                      i think that i shall read the regex links before to try python :-)
                      still thanks guy, your way to copy a block was really great for me

                      1 Reply Last reply Reply Quote 0

                      Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                      Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                      With your input, this post could be even better 💗

                      Register Login
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors