• Login
Community
  • Login

Replace Lines from different files

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
33 Posts 7 Posters 5.7k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A
    Alexei Kurakin
    last edited by Mar 16, 2023, 10:50 AM

    Hello dear community,
    I have no past experience in Notepad++
    and I would like to ask for your help/instructions.
    I have a file to translate and it goes like this:
    Original file:

    GER;10	;Ja
    ENG;10	;Yes
    ITA;10	;Sì
    BGL;10	;
    FRC;10	;Oui
    NLD;10	;Ja
    ESP;10	;Sí
    
    GER;11	;\nHöhe des Bodens
    ENG;11	;\nHeight of bottom panel
    ITA;11	;\nAltezza ripiano (distanza da terra)
    BGL;11	;
    FRC;11	;\nHauteur du fond
    NLD;11	;\nHoogte van de bodem
    ESP;11	;\nEspesor de la base
    

    translated file:

    BGL;10  ;TEXT1
    BGL;11  ;TEXT2
    BGL;12  ;TEXT3
    

    I want to replace all the lines in the original file
    that are matching the beginning of the line in the translated file.

    P 1 Reply Last reply Mar 16, 2023, 2:27 PM Reply Quote 0
    • P
      PeterJones @Alexei Kurakin
      last edited by Mar 16, 2023, 2:27 PM

      @Alexei-Kurakin ,

      Notepad++ cannot natively do it in a single step. Either you will have to copy the second file to the end of the first, then do a complicated search-and-replace, or you will have to install the PythonScript Plugin for Notepad++, then populate the script I shared here with the data from your second file, and run the script.

      If you choose the search-and-replace technique, you can either wait for someone to chime in with a custom expression, or you can search the forum for “translate” and “regex” or “regular expression”

      1 Reply Last reply Reply Quote 3
      • G
        guy038
        last edited by Mar 16, 2023, 6:26 PM

        Hello, @alexei-kurakin, @peterjones and All,

        As explained by @peterjones, a simple solution using regular expressions could be used !

        However, it depends on the average size of your original file. So , what about its size and its number of lines ?

        I assume that the translated file is probably more light compared to the original one !

        Note that my future solution could add, simultaneously, the translation of several languages, all at once !

        See you later,

        Best Regards,

        guy038

        A 1 Reply Last reply Mar 17, 2023, 8:34 AM Reply Quote 2
        • A
          Alexei Kurakin @guy038
          last edited by Mar 17, 2023, 8:34 AM

          Hi, @guy038
          Original file has 550k lines :) (14.6MB)
          and it has 30 different languages
          my translated file has 17k lines only with translation
          of the specific language
          can I hear about your solution also?

          1 Reply Last reply Reply Quote 1
          • G
            guy038
            last edited by Mar 17, 2023, 8:50 AM

            Hi, @alexei-kurakin, @peterjones and **All,

            Thanks for your additional information. This confirms that a regex solution should be OK !

            Just follow all the following steps, carefully !


            • Copy-paste your original file in a new file named, lest’s say, output.txt

            • At the end of the output.txt file, add a new line with, for instance, several equal signs

            • Then, right after, add the contents of your T translated file

            Thus, the new output.txt file should contain the temporary text :

            GER;10	;Ja
            ENG;10	;Yes
            ITA;10	;Sì
            BGL;10	;
            FRC;10	;Oui
            NLD;10	;Ja
            ESP;10	;Sí
            
            GER;11	;\nHöhe des Bodens
            ENG;11	;\nHeight of bottom panel
            ITA;11	;\nAltezza ripiano (distanza da terra)
            BGL;11	;
            FRC;11	;\nHauteur du fond
            NLD;11	;\nHoogte van de bodem
            ESP;11	;\nEspesor de la base
            ===============================================
            BGL;10  ;TEXT1
            BGL;11  ;TEXT2
            BGL;12  ;TEXT3
            BGL;14  ;TEST
            
            • Move at the very beginning of the output.txt file

            • Now, open the Replace dilaog ( Ctrl+ H )

            • Uncheck all box options

            • SEARCH (?x-is) ^ ( .+ ) \h+ ; $ (?= (?s: .+ =+ .+? ) ^ \1 \h+ ; ( .+ ) $ ) | (?s) ^ =+ .+

            • REPLACE ?1$0\2

            • Click once on the Replace All button ( or many times on the Replace dialog )

            Here you are !

            You get your expected text :

            GER;10	;Ja
            ENG;10	;Yes
            ITA;10	;Sì
            BGL;10	;TEXT1
            FRC;10	;Oui
            NLD;10	;Ja
            ESP;10	;Sí
            
            GER;11	;\nHöhe des Bodens
            ENG;11	;\nHeight of bottom panel
            ITA;11	;\nAltezza ripiano (distanza da terra)
            BGL;11	;TEXT2
            FRC;11	;\nHauteur du fond
            NLD;11	;\nHoogte van de bodem
            ESP;11	;\nEspesor de la base
            

            =>

            • All the lines, which ended with a semicolon, are now completed with their translated counterparts

            • The line of equal signes and the contents of the translated file, below, have been removed, as well


            I also assume that your translated file does not contain the “same” line with different translations !

            For instance :

            BGL;10  ;TEXT1
            BGL;11  ;TEXT2
            BGL;10  ;A SECOND translation
            BGL;12  ;TEXT3
            BGL;11  ;an other text
            BGL;13  ;TEST
            BGL;10  ;A THIRD translation
            

            However, don’t worry as this regex always choose the first tranlated word found of the list !

            So, regarding this part of the tranlated file, the original file would contain :

            BGL;10	;TEXT1
            BGL;11	;TEXT2
            BGL;12	;TEXT3
            BGL;13	;TEST
            

            Best Regards,

            guy038

            T 1 Reply Last reply Apr 1, 2023, 7:15 PM Reply Quote 1
            • G
              guy038
              last edited by guy038 Mar 17, 2023, 11:16 AM Mar 17, 2023, 8:56 AM

              Hello, @alexei-kurakin,

              I think that I spoke too quickly ! Because I initially thought that your original file had a size of 550 kb ( not 550k lines ! )

              So, I’m afraid that my regex method is useless and that you need a scripting solution !

              BR

              guy038

              You may test my regex solution with a smaller original and, may be, a smaller translated files ! But it should not work for your exact file sizes :-((

              I did a quick test : with an original file of about 100,000 lines, it will be still OK. But for a file containing 500,000 lines, the results are erroneous !

              1 Reply Last reply Reply Quote 1
              • T
                Ted Plum @guy038
                last edited by Ted Plum Apr 1, 2023, 7:27 PM Apr 1, 2023, 7:15 PM

                @guy038 Hi, this is very close to what I’m doing. Could you help me as well? I have two files. One has text like this (a total of 27k lines):

                {
                “id”: 2257,
                “Key”: “gfr”,
                “enUS”: “ÿc8•ÿc:Flawed R”,
                “zhTW”: “瑕疵紅寶石”,
                “deDE”: “fehlerhafter Rubin”,
                “esES”: “Rubí estropeado”,
                “frFR”: “Rubis imparfait”,
                “itIT”: “Rubino Incrinato”,
                “koKR”: “하급 루비”,
                “plPL”: “Rubin ze skazą”,
                “esMX”: “Rubí imperfecto”,
                “jaJP”: “傷のあるルビー”,
                “ptBR”: “Rubi Imperfeito”,
                “ruRU”: “Мутный рубин”,
                “zhCN”: “有瑕疵的红宝石”
                },
                {
                “id”: 2258,
                “Key”: “gsr”,
                “enUS”: “ÿc8•ÿc:Ruby”,
                “zhTW”: “紅寶石”,
                “deDE”: “Rubin”,
                “esES”: “Rubí”,
                “frFR”: “Rubis”,
                “itIT”: “Rubino”,
                “koKR”: “루비”,
                “plPL”: “Rubin”,
                “esMX”: “[ms]Rubí”,
                “jaJP”: “ルビー”,
                “ptBR”: “Rubi”,
                “ruRU”: “[ms]Рубин”,
                “zhCN”: “红宝石”
                },

                and another file that has different values for the enUS strings. I want to replace every instance of text inside quotation marks in every enUS line with text from a similar file. For example:

                replace the sting from file 1: “enUS”: “ÿc8•ÿc:Flawed R”,

                with

                the string from file 2: “enUS”: “Flawed ruby”,

                What do I modify in this search line?:

                (?x-is) ^ ( .+ ) \h+ ; $ (?= (?s: .+ =+ .+? ) ^ \1 \h+ ; ( .+ ) $ ) | (?s) ^ =+ .+

                Thanks!

                Mark OlsonM 1 Reply Last reply Apr 2, 2023, 4:32 AM Reply Quote 0
                • Mark OlsonM
                  Mark Olson @Ted Plum
                  last edited by Mark Olson Apr 2, 2023, 4:37 AM Apr 2, 2023, 4:32 AM

                  @Ted-Plum
                  This file is JSON.
                  I would strongly recommend using a scripting language like Python to work with it. There are various nice JSON plugins, but none that work well for this specific use case.
                  You can use regex to work with JSON, but regexes will hit lots of evil edge cases in JSON that will make even the most hardened regexers question their sanity.

                  I don’t feel like writing an actual Python script for this right now, but it would probably look something like

                  # this is pseudo-code, don't try to execute it!
                  import json
                  import Npp
                  text_from = Npp.getTextOfFile(filename_from)
                  text_to = Npp.getTextOfFile(filename_to)
                  json_from = json.loads(text1)
                  json_to = json.loads(text2)
                  # at this point I'm assuming that both json files are objects mapping strings like "id" to other things
                  for key, val in json_from.items():
                      json_to[key] = val
                  mutated_text_to = json.dumps(json_to, indent=4)
                  Npp.setTextOfFile(filename_to, mutated_text_to)
                  
                  T 1 Reply Last reply Apr 2, 2023, 9:30 AM Reply Quote 1
                  • T
                    Ted Plum @Mark Olson
                    last edited by Ted Plum Apr 2, 2023, 9:33 AM Apr 2, 2023, 9:30 AM

                    @Mark-Olson I’m not a programmer, I’m in the process of learning HTML and CSS currently, so my knowledge is limited. I’d like to learn this sort of skill, could you point me to where to begin, to make it faster. I hope it doesn’t involve learning the whole of Python. I want to learn it eventually, but that would probably take too long to be relevant to this particular problem.

                    1 Reply Last reply Reply Quote 0
                    • G
                      guy038
                      last edited by guy038 Apr 2, 2023, 12:17 PM Apr 2, 2023, 11:42 AM

                      Hello, @ted-plum,

                      Could you, first, show us some records of your File_2 ?

                      To my mind, it should be listed, like below, in the SAME order than the enUS values found in File_1 ?

                      "enUS": "Flawed Ruby",
                      "enUS": "Text_2"
                      ...
                      ...
                      

                      Best Regards,

                      guy038

                      T 1 Reply Last reply Apr 2, 2023, 1:05 PM Reply Quote 2
                      • T
                        Ted Plum @guy038
                        last edited by Ted Plum Apr 2, 2023, 1:07 PM Apr 2, 2023, 1:05 PM

                        @guy038 here:

                        https://drive.google.com/file/d/1D6eQbp0ZZWdsTO-ARlFCNgd4zJ2zvhvI/view?usp=sharing

                        https://drive.google.com/file/d/1zKY6pC3KK0egW1IyklMopP-6bJjrAAjw/view?usp=sharing

                        these are the two versions of the same list. Not all enUS values need replacing, some are the same, some are absent in the file, from which I want to take the replacements, and they need to remain.

                        1 Reply Last reply Reply Quote 0
                        • Mark OlsonM
                          Mark Olson
                          last edited by Mark Olson Apr 2, 2023, 3:04 PM Apr 2, 2023, 2:53 PM

                          @Ted-Plum said in Replace Lines from different files:

                          I’m not a programmer, I’m in the process of learning HTML and CSS currently, so my knowledge is limited. I’d like to learn this sort of skill, could you point me to where to begin, to make it faster. I hope it doesn’t involve learning the whole of Python.

                          Nobody learns the whole of Python. It’s a general-purpose language, and learning just a little bit can be really useful for just making your everyday life easier.

                          For better or for worse, you are currently working with raw JSON, and Python is one of the best ways for working with JSON. JavaScript is also great for this, but Notepad++ doesn’t have JS scripting support.

                          I’ve looked at your data (I’m a huge Diablo 2 fan, by the way) and here’s my solution:

                          from Npp import *
                          import json
                          
                          # both files must be initially open in Notepad++
                          
                          FROM_PATH = r'c:\full\path\to\from_items.json'
                          notepad.activateFile(FROM_PATH) # the file that you will be taking enUS values FROM
                          text_from = editor.getText() # read the entire file as a Python string
                          json_from = json.loads(text_from) # this translates the file into Python objects that can be manipulated more easily than text
                          
                          # now we do the same thing for the file that you will be moving enUS values TO
                          TO_PATH = r'c:\full\path\to\to_items.json'
                          notepad.activateFile(TO_PATH)
                          json_to = json.loads(editor.getText())
                          
                          # we don't know for sure what items are in from_items,
                          # so we will only get the ones that exist
                          from_keys_to_enUS = {}
                          for item in json_from:
                              # if this item has an enUS translation in this file, we map the key
                              # (which should be the same across both lists)
                              # to the enUS translation.
                              enUS = item.get('enUS')
                              if enUS:
                                  from_keys_to_enUS[item['Key']] = enUS
                          
                          for item in json_to:
                              key = item['Key']
                              # now we check if the old file has an "enUS" entry for this item
                              from_enUS = from_keys_to_enUS.get(key)
                              if from_enUS:
                                  # this transfers the enUS entry to the target file
                                  item['enUS'] = from_enUS
                          
                          # now we've transferred enUS values FROM the source file to the target file
                          # open up the target file and dump our edited values into it
                          notepad.activateFile(TO_PATH)
                          editor.setText(json.dumps(json_to, indent=4))
                          

                          There are so many different ways to approach this problem in Python, and I chose the one that’s most Notepad++ - friendly.

                          As for resources that one could use to get better at Python, three of my best friends when I was a noob were:

                          • StackOverflow (obviously)
                          • the Python standard library documentation (links to json and tutorial
                          • Python for Everybody.
                          T 1 Reply Last reply Apr 2, 2023, 3:07 PM Reply Quote 2
                          • T
                            Ted Plum @Mark Olson
                            last edited by Ted Plum Apr 2, 2023, 3:08 PM Apr 2, 2023, 3:07 PM

                            @Mark-Olson, thanks, I’ll have a go with the script.

                            Edit: All right, you’ve answered my question in your edit.

                            Mark OlsonM 1 Reply Last reply Apr 2, 2023, 3:11 PM Reply Quote 0
                            • Mark OlsonM
                              Mark Olson @Ted Plum
                              last edited by Apr 2, 2023, 3:11 PM

                              @Ted-Plum
                              See the resources that I linked in my most recent edit of the post.

                              I can’t promise that learning Python will make your life better. Initially I expect you will find it frustrating, but I get the impression that you will find plenty of opportunities to use your learnings before too long.

                              T 1 Reply Last reply Apr 3, 2023, 2:02 PM Reply Quote 2
                              • G
                                guy038
                                last edited by guy038 Apr 2, 2023, 11:36 PM Apr 2, 2023, 11:30 PM

                                Hello, @ted-plum, @mark-olson and All,

                                Ah… OK, @ted-plum. The two files are quite similar in size and contents !

                                So, I downloaded your two files from Google Drive


                                • The first one is called !item-names.json. It’s a UTF-8-BOM encoded file with Windows line-breaks ( \r\n )

                                • The second one is called item-names.json. It’s a UTF-8 encoded file with Unix line-break ( \n )

                                In order to compare these two files easily :

                                • I first normalized these two files to the usual UTF-8 encoding and to Windows line-breaks. So :

                                  • I used the View > Encoding option for the !item-names.json file

                                  • I used the Edit > EOL Conversion > Window (CRLF) for the item-names.json file

                                • Secondly, I deleted the outer square brackets in the two files

                                • Thirdly, I run the following regex S/R onto these two files

                                  • SEARCH (?<!\},)(?<!\n)\r\n

                                  • REPLACE \t

                                In order to get a single JSON record per line ( 24,576 occ. for !item-names.json and 25,424 occ. for item-names.json )


                                Then, comparing these two files, I noticed that :

                                • Some records are different because field(s), other than enUS, are modified

                                • Some records are different because field enUS is modified

                                • Some records are different because, both, field enUS and field(s), other than enUS, are modified

                                • Near the end of the !item-names.json file, 6 records are deleted ( from id = 27345 to id = 27350 )

                                • Near the end of the item-names.json file, a 54 records are added ( from id = 61000 to id = 61051 + 61072 and 61073 )


                                Thus, @ted-plum :

                                • Generally speaking, do we have to modify the first !item-names.json file, using data from the item-names.json file or the opposite ?

                                • Do we have to take care about the enUS changes ONLY or do we have to take all the changes in account ?

                                • Do we have to add/delete the new lines, as well ?

                                Best Regards

                                guy038

                                T 1 Reply Last reply Apr 3, 2023, 2:12 PM Reply Quote 2
                                • T
                                  Ted Plum @Mark Olson
                                  last edited by Apr 3, 2023, 2:02 PM

                                  @Mark-Olson, I’ve poked around with the script you’d written, but no luck so far. When trying to run it I keep getting errors such as:

                                  No module named ‘Npp’

                                  or, if I try running it without the “from Npp import *” line, I get:

                                  name ‘notepad’ is not defined

                                  Am I missing something?

                                  Michael VincentM 1 Reply Last reply Apr 3, 2023, 2:12 PM Reply Quote 0
                                  • Michael VincentM
                                    Michael Vincent @Ted Plum
                                    last edited by Apr 3, 2023, 2:12 PM

                                    @Ted-Plum said in Replace Lines from different files:

                                    No module named ‘Npp’
                                    or, if I try running it without the “from Npp import *” line, I get:
                                    name ‘notepad’ is not defined

                                    You are running this from the PythonScript plugin of Notepad++, yes?

                                    Cheers.

                                    T 2 Replies Last reply Apr 3, 2023, 2:15 PM Reply Quote 2
                                    • T
                                      Ted Plum @guy038
                                      last edited by Ted Plum Apr 3, 2023, 2:19 PM Apr 3, 2023, 2:12 PM

                                      @guy038, the file item-names.json has to remain intact, encoded in UTF-8, except for the corresponding enUS values, which should be taken from the !item-names.json, no new lines should be added, and no lines that don’t exist in !item-names.json should be removed from item-names.json. The file with the exclamation mark is just a donor file, that has the right enUS values.

                                      1 Reply Last reply Reply Quote 1
                                      • T
                                        Ted Plum @Michael Vincent
                                        last edited by Ted Plum Apr 3, 2023, 2:21 PM Apr 3, 2023, 2:15 PM

                                        @Michael-Vincent, yes, I’ve installed Python, installed the PythonScript v2 plugin into NPP, and for good measure tried the NppExec plugin. I open both of my JSON files in NPP, open a file with your script pasted into it, and run it through NPP.

                                        Yes, I’ve also renamed the JSON files to match your script.

                                        Alan KilbornA 1 Reply Last reply Apr 3, 2023, 2:22 PM Reply Quote 0
                                        • Alan KilbornA
                                          Alan Kilborn @Ted Plum
                                          last edited by Apr 3, 2023, 2:22 PM

                                          @Ted-Plum said in Replace Lines from different files:

                                          No module named ‘Npp’

                                          yes, I’ve installed Python, installed PythonScript v2, and for good measure tried NppExec. I open both of my JSON files in NPP, open a file with your script pasted into it, and run it through NPP.

                                          There’s no need to install Python as a standalone thing. If there is, then pursuing problems with that is off-topic to this forum.

                                          Please make sure you’ve done due-diligence at reading, understanding and following the basic steps to Pythonscripting, found in the FAQ for this forum HERE.

                                          and for good measure tried NppExec

                                          Doing random things isn’t going to help.

                                          T 1 Reply Last reply Apr 3, 2023, 2:45 PM Reply Quote 3
                                          • First post
                                            Last post
                                          The Community of users of the Notepad++ text editor.
                                          Powered by NodeBB | Contributors