Regex: How can I split lines from text at every 31 words
-
hello. I have a text file with more then 2.000 words. I want to split the text in many lines, after every 31 words.
My regex doesn’t work. Don’t know why…
SEARCH:
(\w+\W+){31}
REPLACE BY:\r\n
-
@Neculai-I-Fantanaru said in Regex: How can I split lines from text at every 31 words:
My regex doesn’t work.
You don’t say why it doesn’t work but my thought is that you are replacing groups of 31 words with an EOL resulting in an empty file, well actually a file with many empty lines
If you select something (as shown in your find code) then you need to write that back. That means putting () around all the find code and then having the replace code as \1\r\n. Alternatively just add
\K
to the end of the find code. What this does is firstly find a group of 31 words, then forgets that selection leaving the cursor ready to insert the EOL code.Terry
-
I don’t get it. Can you please write your solution?
-
@Terry-R said in Regex: How can I split lines from text at every 31 words:
You don’t say why it doesn’t work but my thought is that you are replacing groups of 31 words with an EOL resulting in an empty file, well actually a file with many empty lines
Thanks. This are 2 solutions:
SEARCH:
(\w+\W+){31\K
REPLACE BY:\r\n
OR
SEARCH:
(\w+\W+){31}
REPLACE BY:$0\r\n\r\n
-
@Neculai-I-Fantanaru said in Regex: How can I split lines from text at every 31 words:
SEARCH: (\w+\W+){31}
REPLACE BY: $0\r\n\r\nYes that works as well, using the
$0
which equates to everything which is currently selected. You may want to remove the extra\r\n
though as you will get 2 EOL in sequence.Terry
-
@Terry-R said in Regex: How can I split lines from text at every 31 words:
$0\r\n\
my little problem is the quote (apostrophe) for the words such as
don’t, didn"t, doesn’t
Seems the my regex see that
t
as a different word… -
don’t
,didn"t
,doesn’t
I am assuming you meant
don't
,didn't
,doesn't
… where it is an ASCII apostrophe, not the forum’s smart-quote, and not the ASCII-double-quote that was in the second word.
You told the regex engine to look characters in the posix-class “word”, followed by non-“word” characters. The “word” posix class does not include apostrophe. So with your example text and the sub-expression
\w+\W+
, it is finding the groupsdon'
thent,
thendidn'
thent,
thendoesn'
thent[EOL]
If you want to allow other characters inside the group of characters that you consider a word, they need to be specified in the same character class as the other word-characters, such as
[\w']+\W+
. If it really might be smart-single-quote or ASCII apostrophe, then[\w'’]+\W+
-
@PeterJones said in Regex: How can I split lines from text at every 31 words:
[\w’’]+\W+
THANKS. super, so I update my answer:
FIND:
([\w'’"]+\W+){31}
REPLACE BY:$0\r\n
OR
FIND:
([\w'’"]+\W+){31}\K
REPLACE BY:\r\n