Remove all but first paragraph starting with similar string
-
Hi,
I briefly used notepad++ few years ago but otherwise don’t have much experience with it. I know how to remove all lines which start with certain string. I have searched for quite some time but cannot find the code /macro for the following task which is slightly different than that.I want to keep the first paragraph but remove all subsequent paragraphs which start with same characters. For example, in the following, I want to remove LMN: 2345, XYZ: 3456 and LMN: jkl. The character string is not constant and can vary, but all will end with “:”.
Thank you.BEFORE CODE
XYZ: yui
LMN: tyu
LMN: 2345
XYZ: 3456
ABC: 1234
LMN: jkl
OPQ: 4567
AFTER CODE
XYZ: yui
LMN: tyu
ABC: 1234
OPQ: 4567
-
@TN-MC ,
If your document is pretty long, then the following is the sequence I would recommend:
- Number all the lines
- Make sure there’s a blank line at the end of your document
Ctrl+HOME
Alt+Shift+B
(Edit > Begin/End Select in Column Mode)Ctrl+END
,UpArrow
Alt+Shift+B
(Edit > Begin/End Select in Column Mode)- If you’re in at least v8.6
- THEN type a space, then left arrow
- ELSE type a space, then
Ctrl+HOME
/Alt+Shift+B
/Ctrl+END
/UpArrow
/Alt+Shift+B
again;
Alt+C
(Edit > Column Editor)
**Initial Number: **1
**Increase By: **1
**Leading: **Zeroes
OK
- Move the numbers
Ctrl+H
(Search > Replace)- FIND WHAT:
^(\d+) (\w+:)
(please note: the\w+
will be different if your example data is wrong and it’s not always a single “word” before the colon)
REPLACE WITH: `$2$1:
SEARCH MODE = Regular Expression
REPLACE ALL
- Sort
- Edit > Line Operations > Sort Lines Lexicographically Ascending
- Make sure there’s a blank line at the end of your document
- Reduce to only one copy of each starting word
- FIND WHAT:
(?-s)^(\w+:)(.*\R)(\1.*\R)*
REPLACE WITH:$1$2
SEARCH MODE = Regular Expression
REPLACE ALL
- FIND WHAT:
- Move the line numbers back to the start
- FIND WHAT:
^(\w+:)(\d+:)
REPLACE WITH:$2$1
SEARCH MODE = Regular Expression
REPLACE ALL
- FIND WHAT:
- Sort
- Edit > Line Operations > Sort Lines Lexicographically Ascending
- If you have blank lines at the beginning, you can remove those, and add a blank line at the end
- Remove the line numbering
- FIND WHAT:
^(\d+:)(\w+:)
REPLACE WITH:$2
SEARCH MODE = Regular Expression
REPLACE ALL
- FIND WHAT:
----
Useful References
- Number all the lines
-
@PeterJones Thank you so much for your prompt help. I cannot wait to try it. Actually, I should have created a better example. There could be more than one word before “:”. Also, in real life, there may be a paragraph after “:” and not only a line. When it is a paragraph, I would want that deleted. Thank you.
As a side, I have been trying to learn if there is a difference between paragraph and line in notepad++.
-
@TN-MC said in Remove all but first paragraph starting with similar string:
Actually, I should have created a better example. There could be more than one word before “:”.
Then my solution will definitely not work without you putting effort into editing the regex and making it match.
Also, in real life, there may be a paragraph after “:” and not only a line. When it is a paragraph, I would want that deleted. Thank you.
That will also take more effort on your part.
As a side, I have been trying to learn if there is a difference between paragraph and line in notepad++.
Notepad++ doesn’t know what you consider a paragraph. I am guessing that for you, paragraphs are separated by an empty line, but Notepad++ doesn’t know that, and you’d have to develop a regex that matches your definition (
\R\R
matches the newline at the end of the paragraph and the newline used as a paragraph separator)For your solution, places where I had a single
\R
, you’d need two (if I guess correctly as to your definition). Instead of(\w+:)
, you’d need something like^(.*?:)
. And most of your expressions would need(?s)
to tell it that you want.
(“match any character”) to include newlines as “any character” (and to not use the(?-s)
that I showed in step 4, which made sure.
didn’t match newline).Also, the numbering likely won’t work as well for you as it did for me, since your paragraphs go across multiple lines. My suggestion would be to join together paragraphs into a single line before doing the numbering, and then split at the end. If you need ideas on how to do that, search the forum for posts by me that include the
☺
smiley, because I often use that in examples where I’ve joined lines together.The concepts are similar to what I showed you, but you’ll have to study and adapt other solutions already presented to fit your exact circumstances.
-
@PeterJones Will do. Thank you.