Auto insert break text every x amount of lines
-
Hi there
Hope I can get some help
I have a large text document and want to auto-insert some break text every 200 lines
(ie lines 200, 400, 600 etc)
The break text I wanted to use is a bunch of equal signs (ie like)
=========================================
Hope someone can point me in the right direction
Thanks for the help
-
EDIT: @Andrew-Casey Ignore this post and see the post from @Terry-R below.
I tried the obvious way to do this — using a regular expression to match groups of 200 lines — and it failed with an error message about the complexity of the search being too great. But I can think of a way, if what you want to do is have 200 lines of text, then a separator, then 200 lines of text, then a separator, etc. (meaning the separators will follow the original lines 200, 400, 600, etc, putting them at lines 201, 402, 603, etc. in the changed file).
Put the cursor at the beginning of the file. Select Edit | Column Editor… from the menu. Select Number to Insert, fill in:
Initial number:0
Increase by:5
Repeat:1
Leading: Zeros
and be sure Format is Dec; then click OK.Make note of how many digits were added at the beginning of each line.
Now select Search | Replace… from the menu and fill in:
Find what :
^..000
Replace with :-----==========\r\n-----
but adjust the number of dots in the Find what expression to be three less than the number of digits that were added to each line, and adjust the number of dashes at the beginning and at the end of the Replace with string so that each group has the same number of dashes as there were digits added to each line. (The sample above would be correct if there were five digits added to each line.) Use however many equals signs you want.
Set the Search Mode to Regular expression and click Replace All.
You’ll have a line of equal signs you probably don’t want at the beginning of the file. You can delete that.
Now, put the cursor at the very beginning of the file. Scroll all the way to the bottom using the scroll bar (don’t click in the file). Now, hold down the Shift and Alt keys and click between the end of the added number and the beginning of the original text on the very last line. That will create a rectangular selection enclosing the added numbers. Then press the Delete key.
-
@Coises said in Auto insert break text every x amount of lines:
and it failed with an error message about the complexity of the search being too great.
I am interested in seeing what your regular expression (regex) was as when I tested my version, it worked fine.
My regex is (@Andrew-Casey , make sure search mode is regular expression and click Replace All)
Find What:(?-s)^((.+)?\R){200}\K
Replace With:========================\r\n
Did you employ the
\K
function as maybe that’s what caused the overload of the regex engine (by not using it)?I also considered another version
Find What:(?-s)^(((.+)?\R){20}\K){10}
and with the same Replace string as above. The only benefit of this might be that the regex engine only has to store a subset of the lines, before being reset, such that the two numbers (20 and 10) multiply to the required total lines required.Terry
PS my test line length was about 240 characters (every line)
-
@Terry-R said in Auto insert break text every x amount of lines:
I am interested in seeing what your regular expression (regex) was as when I tested my version, it worked fine.
Yours works fine for me, too.
This is all very strange, and I haven’t yet figured out what is happening. My expression was:
((.*?)\r\n){199}
replacing with:
$0=========================================\r\n
and the test file was 1312 lines, averaging around 150 characters each.Here’s where I’m baffled. That was in 8.5.7 64-bit. A little while after I wrote my comment, I happened to try the same thing in 8.4.8 32-bit, and it worked.
So far, I’ve followed the code as far as seeing that it’s an error from boost::regex, and it’s at least somewhat related to a limiting value stored in a ptrdiff_t that at least some of the time is set to a hard-coded value of 100000000. That exceeds the range of a ptrdiff_t in 32-bit Windows (it fits in 32-bits unsigned, ptrdiff_t is signed).
-
@Coises said in Auto insert break text every x amount of lines:
Here’s where I’m baffled
I recall a conversation some time ago, possibly a few years when some of the posters (I think it was mostly the experienced posters) experienced similar issues, overwhelming the regex engine. At the moment I haven’t located it, but if I recall correctly @guy038 also posted in that thread.
I think the outcome suggested this issue cannot be predetermined from solely volume of characters processed. Environmental setup, involving plugins, undo functionality and other settings could all play a part in the error occurring.
I can see your regex doesn’t use
\K
which may have affected your outcome although I’m not sure that hypothesis could be verified easily and be reproducible.Terry
-
Is 27333 magical? is one of the earlier threads that links to other threads about the topic of mysterious failures.
For the OP I was thinking in terms of skipping lines without saving anything and so used both a non-capturing group and
\K
:
Search:(?-s)^(?:.*?\R){199}\K
Replace:=========================================\r\n
This took about one second on both x32 and x64 builds of v8.5.8 on a 100,000 line file with each line having 500 characters (a 50,200,000 byte file).Using the same expression using
(?s)
instead of(?-s)
also worked but then reported “Invalid Regular Expression.” That puzzled me as I had intentionally used.*?
planning on testing the same expression in both dot-not-matches-newline and dot-matches-newline modes. This works as I knew the scanner would stop at the newlines:(?-s)^(?:.*\R){199}\K
-
@Terry-R said in Auto insert break text every x amount of lines:
I can see your regex doesn’t use \K which may have affected your outcome although I’m not sure that hypothesis could be verified easily and be reproducible.
Apparently I needed some sleep before I tried to answer the original question. My expression was:
((.*?)\r\n){199}
and, of course, should have been:
^(.*?\r\n){199}
which doesn’t give a problem — so long as . matches newline is not checked; but given that condition, or the equivalent need for (?-s), it should have been .*, not .*? — so better yet:
(?-s)^((.*)\r\n){199}
with or without a trailing \K. The problem wasn’t the lack of \K, it was forgetting the caret at the beginning. The \K version does make more sense, though, unless you want to count or replace step-wise.
From reading the boost::regex code and the comments, this error message comes up when a test within the matching process in boost::regex guesses that the number of potential alternatives is growing without bound. This is probably a special case of the infamous Halting problem; if so, the only possible resolution for boost is to use a heuristic.
-
Hello, @andrew-casey, @coises, @terry-r, @mkupper and All,
I did some tests with N++
v8.5.4 64 bits
. I used a text file of11,212,425
bytes, containing158,760
lines, with an average of70
characters per lineI decided to insert a line of equal signs, every
1,000
lines. Thus, the result should be158
consecutive blocks of (1,000
lines + the line of=
) followed by the remaining760
lines ( as158,760
=158
x1,000
+760
)
After verification, I can affirm that the two regexes S/R :
-
SEARCH
(?-s)^(?:.*\R){1000}\K
-
REPLACE
========================\r\n
And :
-
SEARCH
(?-s)^(?:.*\R){1000}
-
REPLACE
$0========================\r\n
Do separate, as expected, text in blocks of
1,000
lines, with a remaining of760
lines
Now, we can also use the initial @coises’s method, with the
Column Editor
-
First, with the same file, we add a vertical separator at beginning of all lines, with the regex S/R :
-
SEARCH
(?-s)^(?=.)
-
REPLACE
\xA6
-
-
Secondly, let’s run the
Edit > Column Editor
option-
Choose the
Number to Insertc
option -
Type in
1
in the two zonesInitial Number :
andIncrease by :
-
Type
1,000
in theRepeat :
zone -
Select the
Zeros
choice for theLeading :
zone -
If necessary, choose the
Dec
format -
Click on the
OK
button -
Delete the isolated numbering of the last line ( value
159
)
-
-
Thirdly, we’re going to add a separator line each time the leading numbering changes with the following regex S/R :
-
SEARCH
(?-s)^(\d+)\xA6.+\R\K(?!\1)(?!\Z)
-
REPLACE
========================\r\n
-
-
Finally, we get rid of this leading numbering with this simple regex S/R :
-
SEARCH
^\d+\xA6
-
REPLACE
Leave EMPTY
-
The comparison with the above examples, just using ONE regex S/R, gave identical results !
Best Regards,
guy038
-
-
excellent worked great thank you