Community
    • Login

    Seperating words in between dash with a line break

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    3 Posts 2 Posters 73 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Matthew HabashM
      Matthew Habash
      last edited by

      Fellow Notepad++ Users,

      Could you please help me the the following regular expression problem I am having?

      I am trying to separate words between dashes with a line break but I’ve scoured forums and message threads and haven’t turned up much.

      Here is a sample text line.

      This is  a sample text---these are the words in between dashes--this is the end of the sample text
      

      Here is how I would like it to look.

      This is  a sample text
      ---these are the words in between dashes---
      this is the end of the sample text
      

      Notice how the words in between dashes are in seperate lines.

      To accomplish this, I have tried using the following Find/Replace expressions and settings

      • Find What = (?=---)(.*)(?=---)
      • Replace With = \n
      • Search Mode = REGULAR EXPRESSION
      • Dot Matches Newline = NOT CHECKED

      I can only get it to look like this:
      This is a sample text
      —these are the words in between dashes–this is the end of the sample text

      Unfortunately, this did not produce the output I desired, and I’m not sure why. Could you please help me understand what went wrong and help me find the solution?

      PeterJonesP 1 Reply Last reply Reply Quote 1
      • PeterJonesP
        PeterJones @Matthew Habash
        last edited by

        @Matthew-Habash said in Seperating words in between dash with a line break:

        Unfortunately, this did not produce the output I desired, and I’m not sure why. Could you please help me understand what went wrong and help me find the solution?

        I do not think you understand lookahead, nor fully understand your data.

        In your Find What, the (?=---) says “the next three characters must be hypens, but DON’T SELECT THEM YET OR MOVE THE CURSOR” – so that puts the search cursor at the beginning of the ---these are the words in between dashes---. Then (.*) says “grab 0 or more of any type of character…”. Then the second (?=---) says "… until you reach a point where the next three characters are hyphens, but don’t select those final three hyphens

        In your example text, there is only one triple-hyphen; the second set of dashes are just a double hyphen. This means that the (.*)(?=---) matches 0 characters – because if the .* doesn’t “eat up” any characters, then the lookahead will find the original triple-hyphen as the matching point. So you’ve got a 0-width selection just before the first and only ---.

        Here’s what it matches, using Find Next instead of Replace, so you can see the “selection” (which it indicates with ^ zero length match):
        5a40ae4c-afd8-4694-b843-ca187d682549-image.png

        Then your replacement says “replace everything that was matched” (ie, the emptiness just before the ---) “with a LF sequence”, which gives you:

        5af4bc30-b0f1-4342-9205-8a228f56e66d-image.png

        Please note: if your sample text had the text you thought it had three hyphens at the end of the phrase:

        This is  a sample text---these are the words in between dashes---this is the end of the sample text
        

        then your regex would have selected the first three hyphens, and the words in between:

        be5e21dd-7999-453a-85a4-5cad8ea9e2de-image.png

        And then the replace would have replaced all that selected text with a newline,
        a4518e76-d315-4b4b-a5da-9b7207d4f65e-image.png

        … which still wouldn’t be what you want.

        Based on your “how I would like it to look”, I think what you really intended to do was to put a newline before the first --- and another newline after the second --- (if there really were a second ---. To do that, I would do the following:

        • Find What: ---.*---
        • Replace With: \r\n$0\r\n
        • Search Mode: Regular Expression
        • . matches newline not checked

        The Find What is simpler: you want to match everything, including the hyphens, so no reason for the lookaheads and groups.
        In the Replace With, I am using $0 to say “include the contents of everything that was matched”, so that your original text isn’t lost. I am also using \r\n instead of just \n, because I am assuming you really have Windows-style newline (CRLF), not Unix/Linux newlines (LF only). If you actually do have Linux LF only, then Replace With: \n$0\n

        Results:
        FIND NEXT =>
        8402e8fe-cfbf-48bc-a252-61795d151df0-image.png

        REPLACE ALL =>
        2c1aca48-1380-46fd-9cc4-238deb5268f1-image.png

        This is  a sample text
        ---these are the words in between dashes---
        this is the end of the sample text
        

        But my regex will not work for your original single-line, because that one only has two hyphens, not three, so it will find no matches and do no replacements.

        Matthew HabashM 1 Reply Last reply Reply Quote 1
        • Matthew HabashM
          Matthew Habash @PeterJones
          last edited by

          @PeterJones Thank you for the explanation and my apologies for the confusion. I meant for both sides of lines to have the same number of dashes. I guess I did not catch that.

          Your regex code worked and that was what I needed. Thank you very much!

          1 Reply Last reply Reply Quote 1
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors