Community
    • Login

    Regex: How can I split lines from text at every 31 words

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    8 Posts 3 Posters 1.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Neculai I. FantanaruN
      Neculai I. Fantanaru
      last edited by Neculai I. Fantanaru

      hello. I have a text file with more then 2.000 words. I want to split the text in many lines, after every 31 words.

      My regex doesn’t work. Don’t know why…

      SEARCH: (\w+\W+){31}
      REPLACE BY: \r\n

      Terry RT 1 Reply Last reply Reply Quote 0
      • Terry RT
        Terry R @Neculai I. Fantanaru
        last edited by

        @Neculai-I-Fantanaru said in Regex: How can I split lines from text at every 31 words:

        My regex doesn’t work.

        You don’t say why it doesn’t work but my thought is that you are replacing groups of 31 words with an EOL resulting in an empty file, well actually a file with many empty lines

        If you select something (as shown in your find code) then you need to write that back. That means putting () around all the find code and then having the replace code as \1\r\n. Alternatively just add \K to the end of the find code. What this does is firstly find a group of 31 words, then forgets that selection leaving the cursor ready to insert the EOL code.

        Terry

        1 Reply Last reply Reply Quote 2
        • Neculai I. FantanaruN
          Neculai I. Fantanaru
          last edited by

          @Terry-R

          I don’t get it. Can you please write your solution?

          1 Reply Last reply Reply Quote 0
          • Neculai I. FantanaruN
            Neculai I. Fantanaru
            last edited by

            @Terry-R said in Regex: How can I split lines from text at every 31 words:

            You don’t say why it doesn’t work but my thought is that you are replacing groups of 31 words with an EOL resulting in an empty file, well actually a file with many empty lines

            Thanks. This are 2 solutions:

            SEARCH: (\w+\W+){31\K
            REPLACE BY: \r\n

            OR

            SEARCH: (\w+\W+){31}
            REPLACE BY: $0\r\n\r\n

            Terry RT 1 Reply Last reply Reply Quote 2
            • Terry RT
              Terry R @Neculai I. Fantanaru
              last edited by Terry R

              @Neculai-I-Fantanaru said in Regex: How can I split lines from text at every 31 words:

              SEARCH: (\w+\W+){31}
              REPLACE BY: $0\r\n\r\n

              Yes that works as well, using the $0 which equates to everything which is currently selected. You may want to remove the extra \r\n though as you will get 2 EOL in sequence.

              Terry

              1 Reply Last reply Reply Quote 1
              • Neculai I. FantanaruN
                Neculai I. Fantanaru
                last edited by

                @Terry-R said in Regex: How can I split lines from text at every 31 words:

                $0\r\n\

                my little problem is the quote (apostrophe) for the words such as don’t, didn"t, doesn’t

                Seems the my regex see that t as a different word…

                PeterJonesP 1 Reply Last reply Reply Quote 0
                • PeterJonesP
                  PeterJones @Neculai I. Fantanaru
                  last edited by PeterJones

                  @Neculai-I-Fantanaru

                  don’t, didn"t, doesn’t

                  I am assuming you meant

                  don't, didn't, doesn't

                  … where it is an ASCII apostrophe, not the forum’s smart-quote, and not the ASCII-double-quote that was in the second word.

                  You told the regex engine to look characters in the posix-class “word”, followed by non-“word” characters. The “word” posix class does not include apostrophe. So with your example text and the sub-expression \w+\W+, it is finding the groups don' then t, then didn' then t, then doesn' then t[EOL]

                  If you want to allow other characters inside the group of characters that you consider a word, they need to be specified in the same character class as the other word-characters, such as [\w']+\W+ . If it really might be smart-single-quote or ASCII apostrophe, then [\w'’]+\W+

                  1 Reply Last reply Reply Quote 2
                  • Neculai I. FantanaruN
                    Neculai I. Fantanaru
                    last edited by Neculai I. Fantanaru

                    @PeterJones said in Regex: How can I split lines from text at every 31 words:

                    [\w’’]+\W+

                    THANKS. super, so I update my answer:

                    FIND: ([\w'’"]+\W+){31}
                    REPLACE BY: $0\r\n

                    OR

                    FIND: ([\w'’"]+\W+){31}\K
                    REPLACE BY: \r\n

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors