Joining every 100000 lines into 1 line
-
Hi. I have 10 million lines and need to join every 100000 lines into 1 line, with the goal of having 100 lines in the end.
I have found an approach that works like this:
Find: (.)\r\n(.)\r\n(.)\r\n(.)\r\n(.*)\r\n
Replace with: \1\2\3\4\5\r\nbut the result is not quite what I want after repeating that process multiple times, as i don’t have exactly multiple 100000 lines each joined in 1 line in the end.
Would love to get some help here.
-
@M-P said in Joining every 100000 lines into 1 line:
but the result is not quite what I want after repeating that process multiple times, as i don’t have exactly multiple 100000 lines each joined in 1 line in the end.
Would love to get some help here.I found this question rather intriguing, fully expecting Notepad++ to be overwhelmed with either the sheer number of lines and/or the number of possible characters when combining those lines. I must say I have been pleasantly surprised.
I created a 20 character line, then replicated that 10 times. Copied all and replicated that 10 times. Continue that theme until I had my 10M lines with a grand total of 220M characters.
I then ran the regex you see below. The first iteration took approximately 5 mins (I didn’t think to start the stopwatch) to complete. The second iteration took about 30 seconds, then 8 seconds, and subsequent iterations took about 5 seconds each time.
And lo and behold it actually worked, whereas I had thought it would have crashed.
So as you can see I worked on the “power of 10” and just ran the regex 6 times to get the 100 lines required. I was somewhat surprised to see you’d worked with 5 lines at a time, when I thought it would be obvious that 10 was the number of lines to aim for.
I note that your regex seems to only look for 1 character per line. If that’s true then you should have no problem with my solution.
Find What:
(?-s)(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)
Replace With:${1}${2}${3}${4}${5}${6}${7}${8}${9}${10}
Terry
-
Hello, @m-p, @terry-r and All,
I’ve got a solution almost similar to @terry-r’s one !
SEARCH
(?-s)^(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+\R)
REPLACE
$1$2$3$4$5$6$7$8$9$10
Note that I include the line-break in group
10
Unlike in @terry-r 's solution, you just need to click
5
times, consecutively, on theReplace All
buttonSo, the time to go from power
10^1
to power10^5
!Best Regards,
guy038
-