Community
    • Login

    Remove all but first paragraph starting with similar string

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 2 Posters 944 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • TN MCT
      TN MC
      last edited by

      Hi,
      I briefly used notepad++ few years ago but otherwise don’t have much experience with it. I know how to remove all lines which start with certain string. I have searched for quite some time but cannot find the code /macro for the following task which is slightly different than that.

      I want to keep the first paragraph but remove all subsequent paragraphs which start with same characters. For example, in the following, I want to remove LMN: 2345, XYZ: 3456 and LMN: jkl. The character string is not constant and can vary, but all will end with “:”.
      Thank you.

      BEFORE CODE

      XYZ: yui

      LMN: tyu

      LMN: 2345

      XYZ: 3456

      ABC: 1234

      LMN: jkl

      OPQ: 4567

      AFTER CODE

      XYZ: yui

      LMN: tyu

      ABC: 1234

      OPQ: 4567

      PeterJonesP 1 Reply Last reply Reply Quote 1
      • PeterJonesP
        PeterJones @TN MC
        last edited by PeterJones

        @TN-MC ,

        If your document is pretty long, then the following is the sequence I would recommend:

        1. Number all the lines
          • Make sure there’s a blank line at the end of your document
          • Ctrl+HOME
          • Alt+Shift+B (Edit > Begin/End Select in Column Mode)
          • Ctrl+END, UpArrow
          • Alt+Shift+B (Edit > Begin/End Select in Column Mode)
          • If you’re in at least v8.6
            • THEN type a space, then left arrow
            • ELSE type a space, then Ctrl+HOME/Alt+Shift+B/Ctrl+END/UpArrow/Alt+Shift+B again;
          • Alt+C (Edit > Column Editor)
            **Initial Number: ** 1
            **Increase By: ** 1
            **Leading: ** Zeroes
            OK
        2. Move the numbers
          • Ctrl+H (Search > Replace)
          • FIND WHAT: ^(\d+) (\w+:) (please note: the \w+ will be different if your example data is wrong and it’s not always a single “word” before the colon)
            REPLACE WITH: `$2$1:
            SEARCH MODE = Regular Expression
            REPLACE ALL
        3. Sort
          • Edit > Line Operations > Sort Lines Lexicographically Ascending
          • Make sure there’s a blank line at the end of your document
        4. Reduce to only one copy of each starting word
          • FIND WHAT: (?-s)^(\w+:)(.*\R)(\1.*\R)*
            REPLACE WITH: $1$2
            SEARCH MODE = Regular Expression
            REPLACE ALL
        5. Move the line numbers back to the start
          • FIND WHAT: ^(\w+:)(\d+:)
            REPLACE WITH: $2$1
            SEARCH MODE = Regular Expression
            REPLACE ALL
        6. Sort
          • Edit > Line Operations > Sort Lines Lexicographically Ascending
          • If you have blank lines at the beginning, you can remove those, and add a blank line at the end
        7. Remove the line numbering
          • FIND WHAT: ^(\d+:)(\w+:)
            REPLACE WITH: $2
            SEARCH MODE = Regular Expression
            REPLACE ALL

        ----

        Useful References

        • Please Read Before Posting
        • Template for Search/Replace Questions
        • Formatting Forum Posts
        • Notepad++ Online User Manual: Searching/Regex
        • FAQ: Where to find other regular expressions (regex) documentation
        TN MCT 1 Reply Last reply Reply Quote 3
        • TN MCT
          TN MC @PeterJones
          last edited by

          @PeterJones Thank you so much for your prompt help. I cannot wait to try it. Actually, I should have created a better example. There could be more than one word before “:”. Also, in real life, there may be a paragraph after “:” and not only a line. When it is a paragraph, I would want that deleted. Thank you.

          As a side, I have been trying to learn if there is a difference between paragraph and line in notepad++.

          PeterJonesP 1 Reply Last reply Reply Quote 0
          • PeterJonesP
            PeterJones @TN MC
            last edited by

            @TN-MC said in Remove all but first paragraph starting with similar string:

            Actually, I should have created a better example. There could be more than one word before “:”.

            Then my solution will definitely not work without you putting effort into editing the regex and making it match.

            Also, in real life, there may be a paragraph after “:” and not only a line. When it is a paragraph, I would want that deleted. Thank you.

            That will also take more effort on your part.

            As a side, I have been trying to learn if there is a difference between paragraph and line in notepad++.

            Notepad++ doesn’t know what you consider a paragraph. I am guessing that for you, paragraphs are separated by an empty line, but Notepad++ doesn’t know that, and you’d have to develop a regex that matches your definition (\R\R matches the newline at the end of the paragraph and the newline used as a paragraph separator)

            For your solution, places where I had a single \R, you’d need two (if I guess correctly as to your definition). Instead of (\w+:), you’d need something like ^(.*?:) . And most of your expressions would need (?s) to tell it that you want . (“match any character”) to include newlines as “any character” (and to not use the (?-s) that I showed in step 4, which made sure . didn’t match newline).

            Also, the numbering likely won’t work as well for you as it did for me, since your paragraphs go across multiple lines. My suggestion would be to join together paragraphs into a single line before doing the numbering, and then split at the end. If you need ideas on how to do that, search the forum for posts by me that include the ☺ smiley, because I often use that in examples where I’ve joined lines together.

            The concepts are similar to what I showed you, but you’ll have to study and adapt other solutions already presented to fit your exact circumstances.

            TN MCT 1 Reply Last reply Reply Quote 2
            • TN MCT
              TN MC @PeterJones
              last edited by

              @PeterJones Will do. Thank you.

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors