• Login
Community
  • Login

Merge 2 text files with exact same line and removing duplicates

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
removeduplicatescombine
5 Posts 3 Posters 1.7k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D
    Devin Rusty
    last edited by Oct 12, 2018, 5:54 AM

    I have 2 files :

    FILE A :

    $ BEGIN STRING

    $ CONTEXT: Actors/1/description/ < UNTRANSLATED
    I walk through a number of battlefield, mercenary veteran who has survived. usually
    But good-natured, but once turn into berserk if Hajimare a fight.
    $ END STRING
    $ BEGIN STRING

    FILE B :

    $ BEGIN STRING
    数々の戦場を渡り歩き、生き延びてきた歴戦の傭兵。普段は
    温厚だが、ひとたび戦いが始まれば狂戦士と化す。
    $ CONTEXT: Actors/1/description/ < UNTRANSLATED

    $ END STRING

    How do I merge those 2 files and end up like this :

    Merged :

    $ BEGIN STRING
    数々の戦場を渡り歩き、生き延びてきた歴戦の傭兵。普段は
    温厚だが、ひとたび戦いが始まれば狂戦士と化す。
    $ CONTEXT: Actors/1/description/ < UNTRANSLATED
    I walk through a number of battlefield, mercenary veteran who has survived. usually
    But good-natured, but once turn into berserk if Hajimare a fight.
    $ END STRING

    1 Reply Last reply Reply Quote 0
    • G
      guy038
      last edited by guy038 Oct 12, 2018, 6:14 PM Oct 12, 2018, 11:06 AM

      Hello, @devin-rusty, and All,

      Seemingly, the link between your two files is the line $ CONTEXT: Actors/1/description/ < UNTRANSLATED

      So, I’m going to use the same principle as the one, used at the end of that post :

      https://notepad-plus-plus.org/community/topic/16446/is-there-a-way-to-hide-commands/13


      To test it, I created an sample of your File B, below, containing 3 records where $ CONTEXT: lines differs from the number 1, 2 or 3

      ----------------------- File B ----------------------------------
      
      $ BEGIN STRING
      数々の戦場を渡り歩き、生き延びてきた歴戦の傭兵。普段は
      温厚だが、ひとたび戦いが始まれば狂戦士と化す。
      $ CONTEXT: Actors/1/description/ < UNTRANSLATED
      
      $ END STRING
      $ BEGIN STRING
      数々の戦場を渡り歩き、生き延びてきた歴戦の傭兵。普段は
      温厚だが、ひとたび戦いが始まれば狂戦士と化す。
      $ CONTEXT: Actors/2/description/ < UNTRANSLATED
      
      $ END STRING
      $ BEGIN STRING
      数々の戦場を渡り歩き、生き延びてきた歴戦の傭兵。普段は
      温厚だが、ひとたび戦いが始まれば狂戦士と化す。
      $ CONTEXT: Actors/3/description/ < UNTRANSLATED
      
      $ END STRING
      

      Note : The Chinese text, is identical in these 3 records !

      Then, I created a sample of your File A, below, containing 3 different blocks $ CONTEXT:...........$ END STRING

      ----------------------------------- File A ---------------------------------------------
      $ BEGIN STRING
      
      $ CONTEXT: Actors/1/description/ < UNTRANSLATED
      I walk through a number of battlefield, mercenary veteran who has survived. usually
      But good-natured, but once turn into berserk if Hajimare a fight.
      $ END STRING
      
      $ CONTEXT: Actors/2/description/ < UNTRANSLATED
      It is a simple try
      with any text
      $ END STRING
      
      $ CONTEXT: Actors/3/description/ < UNTRANSLATED
      Here is the last bunch
      of text to test my solution
      $ END STRING
      

      Note : I did not add the last line of your File A, as I supposed it’s just was the beginning of the next record !


      Now, here is the method used to solve your problem :

      • Paste all the File B contents in a N++ new tab

      • Add a new line of equal signs, as, for instance, =================

      • Paste all the File A contents, after this line

      => We end up with that text :

      $ BEGIN STRING
      数々の戦場を渡り歩き、生き延びてきた歴戦の傭兵。普段は
      温厚だが、ひとたび戦いが始まれば狂戦士と化す。
      $ CONTEXT: Actors/1/description/ < UNTRANSLATED
      
      $ END STRING
      $ BEGIN STRING
      数々の戦場を渡り歩き、生き延びてきた歴戦の傭兵。普段は
      温厚だが、ひとたび戦いが始まれば狂戦士と化す。
      $ CONTEXT: Actors/2/description/ < UNTRANSLATED
      
      $ END STRING
      $ BEGIN STRING
      数々の戦場を渡り歩き、生き延びてきた歴戦の傭兵。普段は
      温厚だが、ひとたび戦いが始まれば狂戦士と化す。
      $ CONTEXT: Actors/3/description/ < UNTRANSLATED
      
      $ END STRING
      ====================================================
      $ BEGIN STRING
      
      $ CONTEXT: Actors/1/description/ < UNTRANSLATED
      I walk through a number of battlefield, mercenary veteran who has survived. usually
      But good-natured, but once turn into berserk if Hajimare a fight.
      $ END STRING
      
      $ CONTEXT: Actors/2/description/ < UNTRANSLATED
      It is a simple try
      with any text
      $ END STRING
      
      $ CONTEXT: Actors/3/description/ < UNTRANSLATED
      Here is the last bunch
      of text to test my solution
      $ END STRING
      

      Now, using the menu command Edit > Line Operations > Remove Empty Lines ( Containing Blank characters), we get rid of all the blank lines, giving the text, below :

      $ BEGIN STRING
      数々の戦場を渡り歩き、生き延びてきた歴戦の傭兵。普段は
      温厚だが、ひとたび戦いが始まれば狂戦士と化す。
      $ CONTEXT: Actors/1/description/ < UNTRANSLATED
      $ END STRING
      $ BEGIN STRING
      数々の戦場を渡り歩き、生き延びてきた歴戦の傭兵。普段は
      温厚だが、ひとたび戦いが始まれば狂戦士と化す。
      $ CONTEXT: Actors/2/description/ < UNTRANSLATED
      $ END STRING
      $ BEGIN STRING
      数々の戦場を渡り歩き、生き延びてきた歴戦の傭兵。普段は
      温厚だが、ひとたび戦いが始まれば狂戦士と化す。
      $ CONTEXT: Actors/3/description/ < UNTRANSLATED
      $ END STRING
      ====================================================
      $ BEGIN STRING
      $ CONTEXT: Actors/1/description/ < UNTRANSLATED
      I walk through a number of battlefield, mercenary veteran who has survived. usually
      But good-natured, but once turn into berserk if Hajimare a fight.
      $ END STRING
      $ BEGIN STRING
      $ CONTEXT: Actors/2/description/ < UNTRANSLATED
      It is a simple try
      with any text
      $ END STRING
      $ BEGIN STRING
      $ CONTEXT: Actors/3/description/ < UNTRANSLATED
      Here is the last bunch
      of text to test my solution
      $ END STRING
      
      • Finally, open the N++ Replace dialog ( Ctrl + H )

      • SEARCH (?-is)^(\$ CONTEXT:.+\R)(?=(?s).+\R\1(.+?)^\$ END STRING)|(?s)^=+.+

      • REPLACE \1\2

      • Set the Wrap around option

      • Select the Regular expression search mode

      • Click on the Replace All button

      Nice :-)) We get the expected text !

      $ BEGIN STRING
      数々の戦場を渡り歩き、生き延びてきた歴戦の傭兵。普段は
      温厚だが、ひとたび戦いが始まれば狂戦士と化す。
      $ CONTEXT: Actors/1/description/ < UNTRANSLATED
      I walk through a number of battlefield, mercenary veteran who has survived. usually
      But good-natured, but once turn into berserk if Hajimare a fight.
      $ END STRING
      $ BEGIN STRING
      数々の戦場を渡り歩き、生き延びてきた歴戦の傭兵。普段は
      温厚だが、ひとたび戦いが始まれば狂戦士と化す。
      $ CONTEXT: Actors/2/description/ < UNTRANSLATED
      It is a simple try
      with any text
      $ END STRING
      $ BEGIN STRING
      数々の戦場を渡り歩き、生き延びてきた歴戦の傭兵。普段は
      温厚だが、ひとたび戦いが始まれば狂戦士と化す。
      $ CONTEXT: Actors/3/description/ < UNTRANSLATED
      Here is the last bunch
      of text to test my solution
      $ END STRING
      

      Notes : Globally, the search regex :

      • Matches every $ CONTEXT: line, with its EOL chars, in the File B part, ( stored as group 1 ), ONLY IF there is an identical line, found, further on, in the File A part, after the line of equal signs and also grabs all text till the nearest $ END STRING ( stored as group 2 )

      • When NO more $ CONTEXT: lines can be found, in the File B part, then it attempts to match from the line of equal signs ======= till the very end of file

      • Now, in replacement, any complete $ CONTEXT: line, found in the File B part, is replaced by itself ( \1 ), along with the block, found in the File A part, after the $ CONTEXT: line ( \2 )

      • Then, all text starting with the ========= line is simply deleted, as, this time, groups 1 and 2 are not defined !

      Best Regards,

      guy038

      D 1 Reply Last reply Oct 12, 2018, 5:45 PM Reply Quote 3
      • D
        Devin Rusty
        last edited by Devin Rusty Oct 12, 2018, 5:23 PM Oct 12, 2018, 5:23 PM

        @guy038 said:

        (?-is)^($ CONTEXT:.+\R)(?=(?s).+\R\1(.+?)^$ END STRING)|(?s)^=+.+

        Hey, thank you for your reply. Unfortunately, when I did all of the steps, it just deleting all texts under the ====== line. My bad for not providing the ‘real’ document. Here is the real document btw :

        > RPGMAKER TRANS PATCH FILE VERSION 3.2
        > BEGIN STRING
        エリック
        > CONTEXT: Actors/1/name/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        数々の戦場を渡り歩き、生き延びてきた歴戦の傭兵。普段は
        温厚だが、ひとたび戦いが始まれば狂戦士と化す。
        > CONTEXT: Actors/1/description/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        銀の死神
        > CONTEXT: Actors/1/nickname/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        ナタリー
        > CONTEXT: Actors/2/name/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        暗殺拳の達人を祖父にもつ少女。幼少のころからその技の
        すべてを叩き込まれている格闘術のエキスパート。
        > CONTEXT: Actors/2/description/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        紅蓮の迅雷
        > CONTEXT: Actors/2/nickname/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        テレンス
        > CONTEXT: Actors/3/name/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        謀略により地位を剥奪された聖騎士。真の騎士道を極めるため
        各地をさまよい修練を重ねている。
        > CONTEXT: Actors/3/description/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        流浪の聖騎士
        > CONTEXT: Actors/3/nickname/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        アーネスト
        > CONTEXT: Actors/4/name/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        師匠の仇を探して旅を続ける剣士。剣に魔力を宿らせる技
        「魔法剣」を体得している。
        > CONTEXT: Actors/4/description/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        魔剣を継ぐ者
        > CONTEXT: Actors/4/nickname/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        リョーマ
        > CONTEXT: Actors/5/name/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        東国では無双の剛剣と称された、桜花一刀流の使い手。
        流れるような動きから繰り出される一閃は、重く、鋭い。
        > CONTEXT: Actors/5/description/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        暁の剛剣
        > CONTEXT: Actors/5/nickname/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        ブレンダ
        > CONTEXT: Actors/6/name/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        森の精霊に育てられた少女。自然を愛し、森の平穏を乱す者を
        許さない。都会での生活にちょっとだけ憧れている。
        > CONTEXT: Actors/6/description/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        深緑の護り手
        > CONTEXT: Actors/6/nickname/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        リック
        > CONTEXT: Actors/7/name/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        束縛されることが嫌いな自称義賊の青年。軽口ばかり叩くが
        仲間のためなら命も張れる熱血漢。
        > CONTEXT: Actors/7/description/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        見えざる疾風
        > CONTEXT: Actors/7/nickname/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        アリス
        > CONTEXT: Actors/8/name/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        神託により聖女となることを運命づけられた女性。慈愛に満ち
        その愛情は敵に対しても等しく与えられる。
        > CONTEXT: Actors/8/description/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        救世の聖女
        > CONTEXT: Actors/8/nickname/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        イザベル
        > CONTEXT: Actors/9/name/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        永きに渡り人類に恐怖を与えてきた魔女。転生術の失敗により
        記憶の大半を失っているが、キレると本性が出る。
        > CONTEXT: Actors/9/description/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        優雅なる悪夢
        > CONTEXT: Actors/9/nickname/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        ノア
        > CONTEXT: Actors/10/name/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        俗世との交わりを避け、山奥に隠れ住む賢者。凶星の正体を
        調べるため、伝説にある「最果ての書庫」を探す旅に出る。
        > CONTEXT: Actors/10/description/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        星を見る者
        > CONTEXT: Actors/10/nickname/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        クラリス
        > CONTEXT: Actors/15/name/ < UNTRANSLATED
        > CONTEXT: Actors/20/name/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        マリー
        > CONTEXT: Actors/16/name/ < UNTRANSLATED
        > CONTEXT: Actors/21/name/ < UNTRANSLATED
        
        > END STRING
        ============================================================
        > BEGIN STRING
        
        > CONTEXT: Actors / 1/name/ < UNTRANSLATED
        Eric
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/1/description/ < UNTRANSLATED
        I walk through a number of battlefield, mercenary veteran who has survived. usually
        But good-natured, but once turn into berserk if Hajimare a fight.
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/1/nickname/ < UNTRANSLATED
        Death of silver
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/2/name/ < UNTRANSLATED
        Natalie
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/2/description/ < UNTRANSLATED
        The girl with the grandfather a master of assassination fist. Of the skills from childhood
        Expert of fighting surgery that has been hammered all.
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/2/nickname/ < UNTRANSLATED
        Thunderclap of Guren
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/3/name/ < UNTRANSLATED
        Terence
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/3/description/ < UNTRANSLATED
        St. knight that has been stripped of his position by the conspiracy. In order to master the true chivalry
        It has repeatedly wander training the country.
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/3/nickname/ < UNTRANSLATED
        Exile of the Holy Knight
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/4/name/ < UNTRANSLATED
        Ernest
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/4/description/ < UNTRANSLATED
        Swordsman to continue the journey looking for the revenge of the teacher. Technique to dwell magic to sword
        It has mastered the "magic sword".
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/4/nickname/ < UNTRANSLATED
        The Inheritors magic sword
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/5/name/ < UNTRANSLATED
        Ryoma
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/5/description/ < UNTRANSLATED
        In the eastern provinces it was called Tsuyoshi sword of Muso, cherry blossoms ittō-ryū consumer of.
        Issen fed from flowing motion are heavy, sharp.
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/5/nickname/ < UNTRANSLATED
        Akatsuki of Tsuyoshiken
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/6/name/ < UNTRANSLATED
        Brenda
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/6/description/ < UNTRANSLATED
        Girl who was brought up in the spirit of the forest. Love nature, those who disturb the peace of the forest
        unforgivable. Are longing only a little to the life in the city.
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/6/nickname/ < UNTRANSLATED
        Dark green be safety hand
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/7/name/ < UNTRANSLATED
        Rick
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/7/description/ < UNTRANSLATED
        Youth of hate self-styled gentleman thief is to be bound. Hit just joke but
        Life also Harel dashing if it is for the fellow.
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/7/nickname/ < UNTRANSLATED
        Invisible Gale
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/8/name/ < UNTRANSLATED
        Alice
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/8/description/ < UNTRANSLATED
        Woman destined to be a saint by the oracle. Benevolent
        The love is given equally to the enemy.
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/8/nickname/ < UNTRANSLATED
        Salvation of the saint
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/9/name/ < UNTRANSLATED
        Isabel
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/9/description/ < UNTRANSLATED
        Witch has given fear to mankind over the eternal. By the failure of the reincarnation surgery
        I have lost the majority of memory, but leave expires and nature.
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/9/nickname/ < UNTRANSLATED
        Elegance Naru nightmare
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/10/name/ < UNTRANSLATED
        Noah
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/10/description/ < UNTRANSLATED
        Avoid fellowship with worldly, live hidden deep in the mountains wise man. The identity of the evil stars
        Investigate, go on a journey to find the "farthest reaches of the archive" in the legend.
        > END STRING
        
        > BEGIN STRING
        
        > CONTEXT: Actors/10/nickname/ < UNTRANSLATED
        Those who see the stars
        > END STRING
        
        > BEGIN STRING
        > CONTEXT: Actors/15/name/ < UNTRANSLATED
        Claris
        > CONTEXT: Actors/20/name/ < UNTRANSLATED
        
        > END STRING
        
        > BEGIN STRING
        > CONTEXT: Actors/16/name/ < UNTRANSLATED
        Marie
        > CONTEXT: Actors/21/name/ < UNTRANSLATED
        
        > END STRING
        

        Hopefully you can help.

        1 Reply Last reply Reply Quote 0
        • D
          Devin Rusty @guy038
          last edited by Oct 12, 2018, 5:45 PM

          @guy038 Hey sorry for the my reply above. I can’t edit nor delete it. It seems like all the steps you provided works really well. As of my case above, I replace all the > with $ because > kinda screw things up in Regular Expression . Thanks a lot.

          Scott SumnerS 1 Reply Last reply Oct 16, 2018, 5:51 PM Reply Quote 0
          • Scott SumnerS
            Scott Sumner @Devin Rusty
            last edited by Oct 16, 2018, 5:51 PM

            @Devin-Rusty

            I replace all the > with $ because > kinda screw things up in Regular Expression

            I think you mean to say:

            I replace all the $ with > because $ kinda screw things up in Regular Expression

            If that’s truly what you meant, then yes, $ is a special character to regular expressions. You can still use it literally, but you have to do it as a combination of two characters (\$) instead of the single character $.

            1 Reply Last reply Reply Quote 1
            3 out of 5
            • First post
              3/5
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors