Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Joining every 100000 lines into 1 line

    Help wanted · · · – – – · · ·
    3
    4
    124
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M P
      M P last edited by

      Hi. I have 10 million lines and need to join every 100000 lines into 1 line, with the goal of having 100 lines in the end.

      I have found an approach that works like this:

      Find: (.)\r\n(.)\r\n(.)\r\n(.)\r\n(.*)\r\n
      Replace with: \1\2\3\4\5\r\n

      but the result is not quite what I want after repeating that process multiple times, as i don’t have exactly multiple 100000 lines each joined in 1 line in the end.

      Would love to get some help here.

      Terry R 1 Reply Last reply Reply Quote 0
      • Terry R
        Terry R @M P last edited by

        @M-P said in Joining every 100000 lines into 1 line:

        but the result is not quite what I want after repeating that process multiple times, as i don’t have exactly multiple 100000 lines each joined in 1 line in the end.
        Would love to get some help here.

        I found this question rather intriguing, fully expecting Notepad++ to be overwhelmed with either the sheer number of lines and/or the number of possible characters when combining those lines. I must say I have been pleasantly surprised.

        I created a 20 character line, then replicated that 10 times. Copied all and replicated that 10 times. Continue that theme until I had my 10M lines with a grand total of 220M characters.

        I then ran the regex you see below. The first iteration took approximately 5 mins (I didn’t think to start the stopwatch) to complete. The second iteration took about 30 seconds, then 8 seconds, and subsequent iterations took about 5 seconds each time.

        And lo and behold it actually worked, whereas I had thought it would have crashed.

        So as you can see I worked on the “power of 10” and just ran the regex 6 times to get the 100 lines required. I was somewhat surprised to see you’d worked with 5 lines at a time, when I thought it would be obvious that 10 was the number of lines to aim for.

        I note that your regex seems to only look for 1 character per line. If that’s true then you should have no problem with my solution.

        Find What:(?-s)(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)
        Replace With:${1}${2}${3}${4}${5}${6}${7}${8}${9}${10}

        Terry

        1 Reply Last reply Reply Quote 4
        • guy038
          guy038 last edited by guy038

          Hello, @m-p, @terry-r and All,

          I’ve got a solution almost similar to @terry-r’s one !

          SEARCH (?-s)^(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+\R)

          REPLACE $1$2$3$4$5$6$7$8$9$10

          Note that I include the line-break in group 10


          Unlike in @terry-r 's solution, you just need to click 5 times, consecutively, on the Replace All button

          So, the time to go from power 10^1 to power 10^5 !

          Best Regards,

          guy038

          M P 1 Reply Last reply Reply Quote 3
          • M P
            M P @guy038 last edited by

            @Terry-R @guy038 Thank you so much! You guys saved me a lot of time. It works perfectly accurate

            1 Reply Last reply Reply Quote 2
            • First post
              Last post
            Copyright © 2014 NodeBB Forums | Contributors