• Login
Community
  • Login

Remove all but first paragraph starting with similar string

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
5 Posts 2 Posters 1.0k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • T
    TN MC
    last edited by Dec 26, 2023, 12:54 PM

    Hi,
    I briefly used notepad++ few years ago but otherwise don’t have much experience with it. I know how to remove all lines which start with certain string. I have searched for quite some time but cannot find the code /macro for the following task which is slightly different than that.

    I want to keep the first paragraph but remove all subsequent paragraphs which start with same characters. For example, in the following, I want to remove LMN: 2345, XYZ: 3456 and LMN: jkl. The character string is not constant and can vary, but all will end with “:”.
    Thank you.

    BEFORE CODE

    XYZ: yui

    LMN: tyu

    LMN: 2345

    XYZ: 3456

    ABC: 1234

    LMN: jkl

    OPQ: 4567

    AFTER CODE

    XYZ: yui

    LMN: tyu

    ABC: 1234

    OPQ: 4567

    P 1 Reply Last reply Dec 26, 2023, 3:22 PM Reply Quote 1
    • P
      PeterJones @TN MC
      last edited by PeterJones Dec 26, 2023, 3:23 PM Dec 26, 2023, 3:22 PM

      @TN-MC ,

      If your document is pretty long, then the following is the sequence I would recommend:

      1. Number all the lines
        • Make sure there’s a blank line at the end of your document
        • Ctrl+HOME
        • Alt+Shift+B (Edit > Begin/End Select in Column Mode)
        • Ctrl+END, UpArrow
        • Alt+Shift+B (Edit > Begin/End Select in Column Mode)
        • If you’re in at least v8.6
          • THEN type a space, then left arrow
          • ELSE type a space, then Ctrl+HOME/Alt+Shift+B/Ctrl+END/UpArrow/Alt+Shift+B again;
        • Alt+C (Edit > Column Editor)
          **Initial Number: ** 1
          **Increase By: ** 1
          **Leading: ** Zeroes
          OK
      2. Move the numbers
        • Ctrl+H (Search > Replace)
        • FIND WHAT: ^(\d+) (\w+:) (please note: the \w+ will be different if your example data is wrong and it’s not always a single “word” before the colon)
          REPLACE WITH: `$2$1:
          SEARCH MODE = Regular Expression
          REPLACE ALL
      3. Sort
        • Edit > Line Operations > Sort Lines Lexicographically Ascending
        • Make sure there’s a blank line at the end of your document
      4. Reduce to only one copy of each starting word
        • FIND WHAT: (?-s)^(\w+:)(.*\R)(\1.*\R)*
          REPLACE WITH: $1$2
          SEARCH MODE = Regular Expression
          REPLACE ALL
      5. Move the line numbers back to the start
        • FIND WHAT: ^(\w+:)(\d+:)
          REPLACE WITH: $2$1
          SEARCH MODE = Regular Expression
          REPLACE ALL
      6. Sort
        • Edit > Line Operations > Sort Lines Lexicographically Ascending
        • If you have blank lines at the beginning, you can remove those, and add a blank line at the end
      7. Remove the line numbering
        • FIND WHAT: ^(\d+:)(\w+:)
          REPLACE WITH: $2
          SEARCH MODE = Regular Expression
          REPLACE ALL

      ----

      Useful References

      • Please Read Before Posting
      • Template for Search/Replace Questions
      • Formatting Forum Posts
      • Notepad++ Online User Manual: Searching/Regex
      • FAQ: Where to find other regular expressions (regex) documentation
      T 1 Reply Last reply Dec 26, 2023, 4:19 PM Reply Quote 3
      • T
        TN MC @PeterJones
        last edited by Dec 26, 2023, 4:19 PM

        @PeterJones Thank you so much for your prompt help. I cannot wait to try it. Actually, I should have created a better example. There could be more than one word before “:”. Also, in real life, there may be a paragraph after “:” and not only a line. When it is a paragraph, I would want that deleted. Thank you.

        As a side, I have been trying to learn if there is a difference between paragraph and line in notepad++.

        P 1 Reply Last reply Dec 26, 2023, 4:57 PM Reply Quote 0
        • P
          PeterJones @TN MC
          last edited by Dec 26, 2023, 4:57 PM

          @TN-MC said in Remove all but first paragraph starting with similar string:

          Actually, I should have created a better example. There could be more than one word before “:”.

          Then my solution will definitely not work without you putting effort into editing the regex and making it match.

          Also, in real life, there may be a paragraph after “:” and not only a line. When it is a paragraph, I would want that deleted. Thank you.

          That will also take more effort on your part.

          As a side, I have been trying to learn if there is a difference between paragraph and line in notepad++.

          Notepad++ doesn’t know what you consider a paragraph. I am guessing that for you, paragraphs are separated by an empty line, but Notepad++ doesn’t know that, and you’d have to develop a regex that matches your definition (\R\R matches the newline at the end of the paragraph and the newline used as a paragraph separator)

          For your solution, places where I had a single \R, you’d need two (if I guess correctly as to your definition). Instead of (\w+:), you’d need something like ^(.*?:) . And most of your expressions would need (?s) to tell it that you want . (“match any character”) to include newlines as “any character” (and to not use the (?-s) that I showed in step 4, which made sure . didn’t match newline).

          Also, the numbering likely won’t work as well for you as it did for me, since your paragraphs go across multiple lines. My suggestion would be to join together paragraphs into a single line before doing the numbering, and then split at the end. If you need ideas on how to do that, search the forum for posts by me that include the ☺ smiley, because I often use that in examples where I’ve joined lines together.

          The concepts are similar to what I showed you, but you’ll have to study and adapt other solutions already presented to fit your exact circumstances.

          T 1 Reply Last reply Dec 26, 2023, 5:21 PM Reply Quote 2
          • T
            TN MC @PeterJones
            last edited by Dec 26, 2023, 5:21 PM

            @PeterJones Will do. Thank you.

            1 Reply Last reply Reply Quote 0
            1 out of 5
            • First post
              1/5
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors