• Login
Community
  • Login

Joining every 100000 lines into 1 line

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
4 Posts 3 Posters 444 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M
    M P
    last edited by Aug 7, 2022, 9:34 AM

    Hi. I have 10 million lines and need to join every 100000 lines into 1 line, with the goal of having 100 lines in the end.

    I have found an approach that works like this:

    Find: (.)\r\n(.)\r\n(.)\r\n(.)\r\n(.*)\r\n
    Replace with: \1\2\3\4\5\r\n

    but the result is not quite what I want after repeating that process multiple times, as i don’t have exactly multiple 100000 lines each joined in 1 line in the end.

    Would love to get some help here.

    T 1 Reply Last reply Aug 7, 2022, 8:38 PM Reply Quote 0
    • T
      Terry R @M P
      last edited by Aug 7, 2022, 8:38 PM

      @M-P said in Joining every 100000 lines into 1 line:

      but the result is not quite what I want after repeating that process multiple times, as i don’t have exactly multiple 100000 lines each joined in 1 line in the end.
      Would love to get some help here.

      I found this question rather intriguing, fully expecting Notepad++ to be overwhelmed with either the sheer number of lines and/or the number of possible characters when combining those lines. I must say I have been pleasantly surprised.

      I created a 20 character line, then replicated that 10 times. Copied all and replicated that 10 times. Continue that theme until I had my 10M lines with a grand total of 220M characters.

      I then ran the regex you see below. The first iteration took approximately 5 mins (I didn’t think to start the stopwatch) to complete. The second iteration took about 30 seconds, then 8 seconds, and subsequent iterations took about 5 seconds each time.

      And lo and behold it actually worked, whereas I had thought it would have crashed.

      So as you can see I worked on the “power of 10” and just ran the regex 6 times to get the 100 lines required. I was somewhat surprised to see you’d worked with 5 lines at a time, when I thought it would be obvious that 10 was the number of lines to aim for.

      I note that your regex seems to only look for 1 character per line. If that’s true then you should have no problem with my solution.

      Find What:(?-s)(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)
      Replace With:${1}${2}${3}${4}${5}${6}${7}${8}${9}${10}

      Terry

      1 Reply Last reply Reply Quote 4
      • G
        guy038
        last edited by guy038 Aug 8, 2022, 8:26 AM Aug 8, 2022, 8:17 AM

        Hello, @m-p, @terry-r and All,

        I’ve got a solution almost similar to @terry-r’s one !

        SEARCH (?-s)^(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+)\R(.+\R)

        REPLACE $1$2$3$4$5$6$7$8$9$10

        Note that I include the line-break in group 10


        Unlike in @terry-r 's solution, you just need to click 5 times, consecutively, on the Replace All button

        So, the time to go from power 10^1 to power 10^5 !

        Best Regards,

        guy038

        M 1 Reply Last reply Aug 8, 2022, 12:24 PM Reply Quote 3
        • M
          M P @guy038
          last edited by Aug 8, 2022, 12:24 PM

          @Terry-R @guy038 Thank you so much! You guys saved me a lot of time. It works perfectly accurate

          1 Reply Last reply Reply Quote 2
          1 out of 4
          • First post
            1/4
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors