• Login
Community
  • Login

Help with Regex to delete a block in paragraph/line

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
regex
6 Posts 3 Posters 2.9k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D
    du p
    last edited by du p Dec 9, 2017, 9:37 PM Dec 9, 2017, 9:37 PM

    Hi all,
    Let say I have the paragraph below. Is it possible to use notepade++ to delete just the <block…>. the <block> could vary in length ,but it is shorter than 20 character

    Answer <block01>misery adieus add wooded how nay men before though. <block0002 > Pretended belonging contented mrs suffering favourite you the continual. Mrs civil nay least means tried drift. Natural <block0000005>end law whether but and towards certain. Furnished unfeeling his sometimes see day promotion. Quitting informed concerns can men now.

    Thank you

    S 1 Reply Last reply Dec 9, 2017, 9:50 PM Reply Quote 0
    • S
      Scott Sumner @du p
      last edited by Dec 9, 2017, 9:50 PM

      @du-p

      I would try a Replace operation as follows:

      Find-what zone: <block\d+>
      Replace-with zone: make sure this is empty
      Search mode: Regular expression

      \d+ stands for one or more digit characters, 0-9. From your description It is hard to tell what you might need to delete as far as whitespace goes, on either side of the bracketed text…

      1 Reply Last reply Reply Quote 0
      • D
        du p
        last edited by Dec 9, 2017, 10:05 PM

        Forgot to mention that there can be any character besides the number after the word “block”. Ex: <block a> , <block bac> . I’m wondering what you can delete the block base on the start character “<” and the end character “>”. If I use the wild "* " like <(.)>, it would select all the character until the end of the paragraph. I’m wondering if there is such a thing so that you can define a range for the wild card. For example something arbitrary like <(20)> would look for 20 characters within the <>.

        S 1 Reply Last reply Dec 9, 2017, 10:10 PM Reply Quote 0
        • S
          Scott Sumner @du p
          last edited by Dec 9, 2017, 10:10 PM

          @du-p

          Sorry, if you can’t describe your data well with the first go-round, I lose interest; maybe somebody else can pick it up and help…?

          1 Reply Last reply Reply Quote 0
          • G
            guy038
            last edited by guy038 Dec 10, 2017, 3:28 PM Dec 10, 2017, 12:59 PM

            Hello @du-p, and All,

            Ah, du-p! I understood your problem !

            Let’s start with the text, below, where I added three ranges <block....>, to you initial text :

            Answer <block01>misery adieus add wooded how nay men before though. <block0002 > Pretended belonging contented mrs suffering<block> favourite you the continual. Mrs civil nay least means tried drift. Natural <block0000005>end law whether but and towards certain. Furnished unfeeling his sometimes see day promotion. Quitting informed <block12345678901234567890>concerns can men <block123456789012345678901>now.
            

            You’ve certainly gave a try to the regex <block.*> and you were surprised to notice that it matches from <block01> to the last block <block123456789012345678901>, included ! weren’t you ?

            Simply, because the dot ( .), followed by the quantifier * ( or its equivalent {0,} ) represents the greatest range of any standard character, till a > symbol. That’s the default “greedy” behaviour.

            And I guess that you wanted to limit the search to the next 20 characters, after <block, to get individual blocks, only !

            This is useless, as you may use the “lazy” behaviour, by adding an interrogation mark, after the * quantifier !

            So, the final regex S/R would be :

            SEARCH (?-si)<block.*?>

            REPLACE Leave EMPTY

            OPTIONS Regular expression

            ACTION : Replace or Replace All

            Notes :

            • The first part (?-si) ( equivalent to (?-s)(?-i) ) means that :

              • The dot special character matches any single standard character, only, and NOT End of Line chars

              • The search is performed, in a sensitive way. So, it would not match, for instance, the string <BlocK...>

            • Then it matches the exact string <block, followed by the smallest range of standard characters, till an ending symbol >

            • As the replacement zone is empty, theses ranges <block....> are simply deleted


            Note that you may, as well, use the 20 characters limit, if you prefer to ! This time, the regex becomes :

            (?-si)<block.{0,20}?>

            Applied against the text below :

            Answer <block01>misery adieus add wooded how nay men before though. <block0002 > Pretended belonging contented mrs suffering<block> favourite you the continual. Mrs civil nay least means tried drift. Natural <block0000005>end law whether but and towards certain. Furnished unfeeling his sometimes see day promotion. Quitting informed <block12345678901234567890>concerns can men <block123456789012345678901>now.
            

            It would miss the last <block...>, because 21 digits are located, between <block and > !

            Best Regards,

            guy038

            D 1 Reply Last reply Dec 10, 2017, 2:56 PM Reply Quote 1
            • D
              du p @guy038
              last edited by du p Dec 10, 2017, 2:56 PM Dec 10, 2017, 2:56 PM

              @guy038
              Thank you for the thorough explanation

              (?-si)<block.*?> works better than expected for me, even when inside the <block…> exceed 20 characters.

              1 Reply Last reply Reply Quote 0
              4 out of 6
              • First post
                4/6
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors