How to merge



  • @Ekopalypse

    I think there is way more to it than that… :(

    I really think it is a programming problem rather than a Notepad++ problem, though, and since we don’t discuss generic programming per se here, that’s all I’ll say.



  • @Alan-Kilborn

    ahh, you mean the OP has more than just two huge files?
    Might be … ok …



  • @Ekopalypse

    No, OP has 2 files. Why is it not clear from the OP’s original example what he wants to achieve?

    The waters were muddied by a pointless youtube exercise and then the OP giving an incomplete longer data dump.

    I think the problem statement is clear (from the original posting), I just don’t know how to achieve it for him, without generic programming.



  • @Alan-Kilborn

    what is wrong with my post? The missing step to remove the spaces in between the resulting two columns? I would expect that this isn’t an issue but …



  • @Ekopalypse

    OK, I yield. :)





  • @Ekopalypse said:

    what is wrong with my post?

    In the original example, the OP wanted the contents of the second file appended to the lines repeatedly, to fill up. So if there were 1234 lines in the first file, and 789 lines in the second, the 789 would need to be appended to the end of each of the first 789/1234, and then the 445 remaining lines of the first file would need the first 445 lines of the second file appended. Your algorithm doesn’t handle a difference in length of files



  • @PeterJones

    thx.
    aaahhh - now I see and finally understand Alans comment
    Somehow duplicate each line in the names file as many times as there are numbers in the numbers file

    Where is the wood, where is the wood I only see trees? :-D



  • Hello, @adam-luwiko, and All,

    I found out a possible solution, … using regular expressions of course ;-))

    I’m using your two lists, given in the post, below :

    https://notepad-plus-plus.org/community/topic/18139/how-to-merge/3


    • First, open a copy of your Numbers.lst file, in Notepad++

    • Open the Replace dialog ( Ctrl + H )

    • SEARCH \R

    • REPLACE \x20

    • Tick the Wrap around option

    • Select the Regular expression search mode

    • Click, once, on the Replace All button

    => The 75 numbers should be gathered in a single line, only, as below :

    123 234 345 456 567 678 789 890 321 432 543 654 765 876 987 098 121 131 141 151 161 171 181 191 101 212 222 232 242 252 262 272 282 292 202 313 323 333 343 353 363 373 383 393 303 414 424 434 444 454 464 474 484 494 404 515 525 535 545 555 565 575 585 595 505 616 626 636 646 656 666 676 686 696 606
    

    • Now, open a copy of your Names.lst file, in Notepad++

    • Open the Replace dialog ( Ctrl + H )

    • SEARCH (?-s).+

    • Begin the Replace field with the regex $0\x20 ( \x20 represents a single space char )

    • Then, add your one-line list of numbers, to the Replace field

    So, the contents of the Replace zone should be, as below :

    $0\x20123 234 345 456 567 678 789 890 321 432 543 654 765 876 987 098 121 131 141 151 161 171 181 191 101 212 222 232 242 252 262 272 282 292 202 313 323 333 343 353 363 373 383 393 303 414 424 434 444 454 464 474 484 494 404 515 525 535 545 555 565 575 585 595 505 616 626 636 646 656 666 676 686 696 606
    
    • Tick the Wrap around option

    • Select the Regular expression search mode

    • Click, once, on the Replace All button

    => The 81 names should be followed by your one-line list of numbers, as below :

    adam 123 234 345 456 567 678 789 890 321 432 543 654 765 876 987 098 121 131 141 151 161 171 181 191 101 212 222 232 242 252 262 272 282 292 202 313 323 333 343 353 363 373 383 393 303 414 424 434 444 454 464 474 484 494 404 515 525 535 545 555 565 575 585 595 505 616 626 636 646 656 666 676 686 696 606
    afdal 123 234 345 456 567 678 789 890 321 432 543 654 765 876 987 098 121 131 141 151 161 171 181 191 101 212 222 232 242 252 262 272 282 292 202 313 323 333 343 353 363 373 383 393 303 414 424 434 444 454 464 474 484 494 404 515 525 535 545 555 565 575 585 595 505 616 626 636 646 656 666 676 686 696 606
    anang 123 234 345 456 567 678 789 890 321 432 543 654 765 876 987 098 121 131 141 151 161 171 181 191 101 212 222 232 242 252 262 272 282 292 202 313 323 333 343 353 363 373 383 393 303 414 424 434 444 454 464 474 484 494 404 515 525 535 545 555 565 575 585 595 505 616 626 636 646 656 666 676 686 696 606
    .....
    .....
    .....
    abas 123 234 345 456 567 678 789 890 321 432 543 654 765 876 987 098 121 131 141 151 161 171 181 191 101 212 222 232 242 252 262 272 282 292 202 313 323 333 343 353 363 373 383 393 303 414 424 434 444 454 464 474 484 494 404 515 525 535 545 555 565 575 585 595 505 616 626 636 646 656 666 676 686 696 606
    abbas 123 234 345 456 567 678 789 890 321 432 543 654 765 876 987 098 121 131 141 151 161 171 181 191 101 212 222 232 242 252 262 272 282 292 202 313 323 333 343 353 363 373 383 393 303 414 424 434 444 454 464 474 484 494 404 515 525 535 545 555 565 575 585 595 505 616 626 636 646 656 666 676 686 696 606
    abdul 123 234 345 456 567 678 789 890 321 432 543 654 765 876 987 098 121 131 141 151 161 171 181 191 101 212 222 232 242 252 262 272 282 292 202 313 323 333 343 353 363 373 383 393 303 414 424 434 444 454 464 474 484 494 404 515 525 535 545 555 565 575 585 595 505 616 626 636 646 656 666 676 686 696 606
    

    IMPORTANT : You cannot insert more than 2,046 characters, in the Replace zone. So, in case of a huge list of numbers :

    • Split it up, first, in blocks of, let say, 2040 characters, max

    • Modify the Replace zone as required

    • Repeat the previous regex S/R

    BTW, the maximum of characters, allowed in the Text to Insert zone of the Column Editor, is only 1023 !


    Right ! Now, here is the main regex S/R :

    • SEARCH ^(\w+)\h+(\d+)(($)|)

    • REPLACE \1\2\r\n?4:\1

    • Tick the Wrap around option

    • Select the Regular expression search mode

    • Hit, repeatedly, on the ALT + A shortcut ( idem clicking on the Replace All button ) until the message Replace All: 0 occurrences were replaced occurs, at the bottom of the Replace dialog !

    And… you’ll get your expected list :

    adam123
    adam234
    adam345
    .....
    .....
    adam686
    adam696
    adam606
    
    afdal123
    afdal234
    afdal345
    .....
    .....
    afdal686
    afdal696
    afdal606
    
    .....
    .....
    .....
    .....
    .....
    
    abbas123
    abbas234
    abbas345
    ......
    ......
    abbas686
    abbas696
    abbas606
    
    abdul123
    abdul234
    abdul345
    ......
    ......
    abdul686
    abdul696
    abdul606
    

    Notes : Each time, the Replace All action is run :

    • Each name, with its closest number are rewritten, without any blank character, followed with a Windows line-break

    • Then, if the last number of the list is not reached, each name is, then, rewritten, which is, implicitly, followed with all the numbers - 1

    Remark : If you do not want the line-break, between two names, change the Replace zone into :

    REPLACE \1\2?4:\r\n\1

    Best Regards,

    guy038



  • It was stuck in my craw; I knew that Python must have a way of doing a Cartesian Product (though I first had to remember the term for that permutation of two lists). It does: itertools.product.

    Since I liked the idea of practicing iterables/generators, I decided to implement it – it’s actually pretty short in terms of the amount of python, especially ignoring comments / extras:

    # encoding=utf-8
    """in response to https://notepad-plus-plus.org/community/topic/18139/
    
    You can merge two files using the cartesian product <https://en.wikipedia.org/wiki/Cartesian_product>,
    which is implemented in itertools.product() <https://docs.python.org/2/library/itertools.html#itertools.product>
    
    assumes:
    * your file1.txt (names) is in the primary notepad++ view (usually the left)
    * your file2.txt (numbers) is in the secondary notepad++ view (you can RClick on the title tab and Move to Other View)
    * you want the merged file to end up in file1.txt
    * you want to be able to undo if something goes wrong
    """
    from Npp import *
    import itertools
    
    def allLinesNoEOL(scint = editor):
        """a generator to yield all the lines of a given scintilla instance,
    
        All lines have trailing whitespace removed (ie, end-of-lines)
    
        scint defaults to the active editor if not supplied
        """
        for n in range(scint.getLineCount()):
            yield scint.getLine(n).rstrip()
    
    # thanks to @Ekopalypse and @Alan-Kilborn for https://notepad-plus-plus.org/community/topic/18133/regex-rounding-numbers-python-script-does-not-run-properly/24
    try:
        hidden
    except NameError:
        hidden = notepad.createScintilla()
    
    hidden.setText("")
    for p in itertools.product(allLinesNoEOL(editor1), allLinesNoEOL(editor2)):
        hidden.addText(p[0]+p[1]+"\n")
    
    editor1.beginUndoAction()
    editor1.setText(hidden.getText())
    editor1.endUndoAction()
    

    Some benefits of this methodology:

    • it doesn’t have a regex length restriction
    • because the pythonscript is working on just one line at a time, it doesn’t take up much more memory than whatever the files occupy in Notepad++
    • a single undo will undo the whole merge into file1.txt
    • it gave me practice programming / using a generator function – oh, this probably doesn’t help you as much; sorry. :-)


  • @PeterJones said:

    • … it doesn’t take up much more memory than whatever the files occupy in Notepad++

    whoops, that’s a lie. I wrote that thinking “because I don’t copy both files into lists or tuples, it doesn’t use huge memory”. But since I do use a temporary scintilla to hold the results, I actually do duplicate things.

    Might also want to append hidden.setText("") after hidden has been copied into editor1.

    • it doesn’t have a regex length restriction

    Still, props to @guy038 for his regex miracle.



  • @PeterJones said:

    I do use a temporary scintilla

    Yes, I realized after writing that phrase that I could have gotten away with hidden just being a string, rather than a full scintilla object, and thus saving the scintilla overhead (including the do-not-destroy restriction).

    Still, with the scintilla object already there, it becomes an extensible procedure, where one could in theory perform any scintilla-esque action upon the resulting text before copying it into editor1. :-) (Yeah, that’ll justify leaving it as-is. Uh-huh.)



  • @guy038 THANKYOU SO MUCH sir, thats what im talking about. you saved my live. god bless you I hope you have an amazing day.


Log in to reply