Seperating words in between dash with a line break
-
Fellow Notepad++ Users,
Could you please help me the the following regular expression problem I am having?
I am trying to separate words between dashes with a line break but I’ve scoured forums and message threads and haven’t turned up much.
Here is a sample text line.
This is a sample text---these are the words in between dashes--this is the end of the sample textHere is how I would like it to look.
This is a sample text ---these are the words in between dashes--- this is the end of the sample textNotice how the words in between dashes are in seperate lines.
To accomplish this, I have tried using the following Find/Replace expressions and settings
- Find What =
(?=---)(.*)(?=---) - Replace With =
\n - Search Mode = REGULAR EXPRESSION
- Dot Matches Newline = NOT CHECKED
I can only get it to look like this:
This is a sample text
—these are the words in between dashes–this is the end of the sample textUnfortunately, this did not produce the output I desired, and I’m not sure why. Could you please help me understand what went wrong and help me find the solution?
- Find What =
-
@Matthew-Habash said in Seperating words in between dash with a line break:
Unfortunately, this did not produce the output I desired, and I’m not sure why. Could you please help me understand what went wrong and help me find the solution?
I do not think you understand lookahead, nor fully understand your data.
In your Find What, the
(?=---)says “the next three characters must be hypens, but DON’T SELECT THEM YET OR MOVE THE CURSOR” – so that puts the search cursor at the beginning of the---these are the words in between dashes---. Then(.*)says “grab 0 or more of any type of character…”. Then the second(?=---)says "… until you reach a point where the next three characters are hyphens, but don’t select those final three hyphensIn your example text, there is only one triple-hyphen; the second set of dashes are just a double hyphen. This means that the
(.*)(?=---)matches 0 characters – because if the.*doesn’t “eat up” any characters, then the lookahead will find the original triple-hyphen as the matching point. So you’ve got a 0-width selection just before the first and only---.Here’s what it matches, using Find Next instead of Replace, so you can see the “selection” (which it indicates with
^ zero length match):

Then your replacement says “replace everything that was matched” (ie, the emptiness just before the
---) “with a LF sequence”, which gives you:
Please note: if your sample text had the text you thought it had three hyphens at the end of the phrase:
This is a sample text---these are the words in between dashes---this is the end of the sample textthen your regex would have selected the first three hyphens, and the words in between:

And then the replace would have replaced all that selected text with a newline,

… which still wouldn’t be what you want.
Based on your “how I would like it to look”, I think what you really intended to do was to put a newline before the first
---and another newline after the second---(if there really were a second---. To do that, I would do the following:- Find What:
---.*--- - Replace With:
\r\n$0\r\n - Search Mode:
Regular Expression . matches newlinenot checked
The Find What is simpler: you want to match everything, including the hyphens, so no reason for the lookaheads and groups.
In the Replace With, I am using$0to say “include the contents of everything that was matched”, so that your original text isn’t lost. I am also using\r\ninstead of just\n, because I am assuming you really have Windows-style newline (CRLF), not Unix/Linux newlines (LF only). If you actually do have Linux LF only, then Replace With:\n$0\nResults:
FIND NEXT =>

REPLACE ALL =>

This is a sample text ---these are the words in between dashes--- this is the end of the sample textBut my regex will not work for your original single-line, because that one only has two hyphens, not three, so it will find no matches and do no replacements.
- Find What:
-
@PeterJones Thank you for the explanation and my apologies for the confusion. I meant for both sides of lines to have the same number of dashes. I guess I did not catch that.
Your regex code worked and that was what I needed. Thank you very much!