Community
    • Login

    Help with Regex to delete a block in paragraph/line

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    regex
    6 Posts 3 Posters 2.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • du pD
      du p
      last edited by du p

      Hi all,
      Let say I have the paragraph below. Is it possible to use notepade++ to delete just the <block…>. the <block> could vary in length ,but it is shorter than 20 character

      Answer <block01>misery adieus add wooded how nay men before though. <block0002 > Pretended belonging contented mrs suffering favourite you the continual. Mrs civil nay least means tried drift. Natural <block0000005>end law whether but and towards certain. Furnished unfeeling his sometimes see day promotion. Quitting informed concerns can men now.

      Thank you

      Scott SumnerS 1 Reply Last reply Reply Quote 0
      • Scott SumnerS
        Scott Sumner @du p
        last edited by

        @du-p

        I would try a Replace operation as follows:

        Find-what zone: <block\d+>
        Replace-with zone: make sure this is empty
        Search mode: Regular expression

        \d+ stands for one or more digit characters, 0-9. From your description It is hard to tell what you might need to delete as far as whitespace goes, on either side of the bracketed text…

        1 Reply Last reply Reply Quote 0
        • du pD
          du p
          last edited by

          Forgot to mention that there can be any character besides the number after the word “block”. Ex: <block a> , <block bac> . I’m wondering what you can delete the block base on the start character “<” and the end character “>”. If I use the wild "* " like <(.)>, it would select all the character until the end of the paragraph. I’m wondering if there is such a thing so that you can define a range for the wild card. For example something arbitrary like <(20)> would look for 20 characters within the <>.

          Scott SumnerS 1 Reply Last reply Reply Quote 0
          • Scott SumnerS
            Scott Sumner @du p
            last edited by

            @du-p

            Sorry, if you can’t describe your data well with the first go-round, I lose interest; maybe somebody else can pick it up and help…?

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hello @du-p, and All,

              Ah, du-p! I understood your problem !

              Let’s start with the text, below, where I added three ranges <block....>, to you initial text :

              Answer <block01>misery adieus add wooded how nay men before though. <block0002 > Pretended belonging contented mrs suffering<block> favourite you the continual. Mrs civil nay least means tried drift. Natural <block0000005>end law whether but and towards certain. Furnished unfeeling his sometimes see day promotion. Quitting informed <block12345678901234567890>concerns can men <block123456789012345678901>now.
              

              You’ve certainly gave a try to the regex <block.*> and you were surprised to notice that it matches from <block01> to the last block <block123456789012345678901>, included ! weren’t you ?

              Simply, because the dot ( .), followed by the quantifier * ( or its equivalent {0,} ) represents the greatest range of any standard character, till a > symbol. That’s the default “greedy” behaviour.

              And I guess that you wanted to limit the search to the next 20 characters, after <block, to get individual blocks, only !

              This is useless, as you may use the “lazy” behaviour, by adding an interrogation mark, after the * quantifier !

              So, the final regex S/R would be :

              SEARCH (?-si)<block.*?>

              REPLACE Leave EMPTY

              OPTIONS Regular expression

              ACTION : Replace or Replace All

              Notes :

              • The first part (?-si) ( equivalent to (?-s)(?-i) ) means that :

                • The dot special character matches any single standard character, only, and NOT End of Line chars

                • The search is performed, in a sensitive way. So, it would not match, for instance, the string <BlocK...>

              • Then it matches the exact string <block, followed by the smallest range of standard characters, till an ending symbol >

              • As the replacement zone is empty, theses ranges <block....> are simply deleted


              Note that you may, as well, use the 20 characters limit, if you prefer to ! This time, the regex becomes :

              (?-si)<block.{0,20}?>

              Applied against the text below :

              Answer <block01>misery adieus add wooded how nay men before though. <block0002 > Pretended belonging contented mrs suffering<block> favourite you the continual. Mrs civil nay least means tried drift. Natural <block0000005>end law whether but and towards certain. Furnished unfeeling his sometimes see day promotion. Quitting informed <block12345678901234567890>concerns can men <block123456789012345678901>now.
              

              It would miss the last <block...>, because 21 digits are located, between <block and > !

              Best Regards,

              guy038

              du pD 1 Reply Last reply Reply Quote 1
              • du pD
                du p @guy038
                last edited by du p

                @guy038
                Thank you for the thorough explanation

                (?-si)<block.*?> works better than expected for me, even when inside the <block…> exceed 20 characters.

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors