• Login
Community
  • Login

Regex: How can I split lines from text at every 31 words

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
8 Posts 3 Posters 1.5k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • N
    Neculai I. Fantanaru
    last edited by Neculai I. Fantanaru Sep 8, 2021, 5:52 AM Sep 8, 2021, 5:50 AM

    hello. I have a text file with more then 2.000 words. I want to split the text in many lines, after every 31 words.

    My regex doesn’t work. Don’t know why…

    SEARCH: (\w+\W+){31}
    REPLACE BY: \r\n

    T 1 Reply Last reply Sep 8, 2021, 6:31 AM Reply Quote 0
    • T
      Terry R @Neculai I. Fantanaru
      last edited by Sep 8, 2021, 6:31 AM

      @Neculai-I-Fantanaru said in Regex: How can I split lines from text at every 31 words:

      My regex doesn’t work.

      You don’t say why it doesn’t work but my thought is that you are replacing groups of 31 words with an EOL resulting in an empty file, well actually a file with many empty lines

      If you select something (as shown in your find code) then you need to write that back. That means putting () around all the find code and then having the replace code as \1\r\n. Alternatively just add \K to the end of the find code. What this does is firstly find a group of 31 words, then forgets that selection leaving the cursor ready to insert the EOL code.

      Terry

      1 Reply Last reply Reply Quote 2
      • N
        Neculai I. Fantanaru
        last edited by Sep 8, 2021, 7:15 AM

        @Terry-R

        I don’t get it. Can you please write your solution?

        1 Reply Last reply Reply Quote 0
        • N
          Neculai I. Fantanaru
          last edited by Sep 8, 2021, 7:21 AM

          @Terry-R said in Regex: How can I split lines from text at every 31 words:

          You don’t say why it doesn’t work but my thought is that you are replacing groups of 31 words with an EOL resulting in an empty file, well actually a file with many empty lines

          Thanks. This are 2 solutions:

          SEARCH: (\w+\W+){31\K
          REPLACE BY: \r\n

          OR

          SEARCH: (\w+\W+){31}
          REPLACE BY: $0\r\n\r\n

          T 1 Reply Last reply Sep 8, 2021, 8:15 AM Reply Quote 2
          • T
            Terry R @Neculai I. Fantanaru
            last edited by Terry R Sep 8, 2021, 8:17 AM Sep 8, 2021, 8:15 AM

            @Neculai-I-Fantanaru said in Regex: How can I split lines from text at every 31 words:

            SEARCH: (\w+\W+){31}
            REPLACE BY: $0\r\n\r\n

            Yes that works as well, using the $0 which equates to everything which is currently selected. You may want to remove the extra \r\n though as you will get 2 EOL in sequence.

            Terry

            1 Reply Last reply Reply Quote 1
            • N
              Neculai I. Fantanaru
              last edited by Sep 8, 2021, 12:49 PM

              @Terry-R said in Regex: How can I split lines from text at every 31 words:

              $0\r\n\

              my little problem is the quote (apostrophe) for the words such as don’t, didn"t, doesn’t

              Seems the my regex see that t as a different word…

              P 1 Reply Last reply Sep 8, 2021, 1:38 PM Reply Quote 0
              • P
                PeterJones @Neculai I. Fantanaru
                last edited by PeterJones Sep 8, 2021, 1:39 PM Sep 8, 2021, 1:38 PM

                @Neculai-I-Fantanaru

                don’t, didn"t, doesn’t

                I am assuming you meant

                don't, didn't, doesn't

                … where it is an ASCII apostrophe, not the forum’s smart-quote, and not the ASCII-double-quote that was in the second word.

                You told the regex engine to look characters in the posix-class “word”, followed by non-“word” characters. The “word” posix class does not include apostrophe. So with your example text and the sub-expression \w+\W+, it is finding the groups don' then t, then didn' then t, then doesn' then t[EOL]

                If you want to allow other characters inside the group of characters that you consider a word, they need to be specified in the same character class as the other word-characters, such as [\w']+\W+ . If it really might be smart-single-quote or ASCII apostrophe, then [\w'’]+\W+

                1 Reply Last reply Reply Quote 2
                • N
                  Neculai I. Fantanaru
                  last edited by Neculai I. Fantanaru Sep 8, 2021, 2:10 PM Sep 8, 2021, 2:08 PM

                  @PeterJones said in Regex: How can I split lines from text at every 31 words:

                  [\w’’]+\W+

                  THANKS. super, so I update my answer:

                  FIND: ([\w'’"]+\W+){31}
                  REPLACE BY: $0\r\n

                  OR

                  FIND: ([\w'’"]+\W+){31}\K
                  REPLACE BY: \r\n

                  1 Reply Last reply Reply Quote 0
                  4 out of 8
                  • First post
                    4/8
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors