Replace Lines from different files

Ted Plum

@Mark-Olson I’m not a programmer, I’m in the process of learning HTML and CSS currently, so my knowledge is limited. I’d like to learn this sort of skill, could you point me to where to begin, to make it faster. I hope it doesn’t involve learning the whole of Python. I want to learn it eventually, but that would probably take too long to be relevant to this particular problem.

guy038

Hello, @ted-plum,

Could you, first, show us some records of your File_2 ?

To my mind, it should be listed, like below, in the SAME order than the enUS values found in File_1 ?

"enUS": "Flawed Ruby",
"enUS": "Text_2"
...
...

Best Regards,

guy038

Ted Plum

@guy038 here:

https://drive.google.com/file/d/1D6eQbp0ZZWdsTO-ARlFCNgd4zJ2zvhvI/view?usp=sharing

https://drive.google.com/file/d/1zKY6pC3KK0egW1IyklMopP-6bJjrAAjw/view?usp=sharing

these are the two versions of the same list. Not all enUS values need replacing, some are the same, some are absent in the file, from which I want to take the replacements, and they need to remain.

Mark Olson

@Ted-Plum said in Replace Lines from different files:

I’m not a programmer, I’m in the process of learning HTML and CSS currently, so my knowledge is limited. I’d like to learn this sort of skill, could you point me to where to begin, to make it faster. I hope it doesn’t involve learning the whole of Python.

Nobody learns the whole of Python. It’s a general-purpose language, and learning just a little bit can be really useful for just making your everyday life easier.

For better or for worse, you are currently working with raw JSON, and Python is one of the best ways for working with JSON. JavaScript is also great for this, but Notepad++ doesn’t have JS scripting support.

I’ve looked at your data (I’m a huge Diablo 2 fan, by the way) and here’s my solution:

from Npp import *
import json

# both files must be initially open in Notepad++

FROM_PATH = r'c:\full\path\to\from_items.json'
notepad.activateFile(FROM_PATH) # the file that you will be taking enUS values FROM
text_from = editor.getText() # read the entire file as a Python string
json_from = json.loads(text_from) # this translates the file into Python objects that can be manipulated more easily than text

# now we do the same thing for the file that you will be moving enUS values TO
TO_PATH = r'c:\full\path\to\to_items.json'
notepad.activateFile(TO_PATH)
json_to = json.loads(editor.getText())

# we don't know for sure what items are in from_items,
# so we will only get the ones that exist
from_keys_to_enUS = {}
for item in json_from:
    # if this item has an enUS translation in this file, we map the key
    # (which should be the same across both lists)
    # to the enUS translation.
    enUS = item.get('enUS')
    if enUS:
        from_keys_to_enUS[item['Key']] = enUS

for item in json_to:
    key = item['Key']
    # now we check if the old file has an "enUS" entry for this item
    from_enUS = from_keys_to_enUS.get(key)
    if from_enUS:
        # this transfers the enUS entry to the target file
        item['enUS'] = from_enUS

# now we've transferred enUS values FROM the source file to the target file
# open up the target file and dump our edited values into it
notepad.activateFile(TO_PATH)
editor.setText(json.dumps(json_to, indent=4))

There are so many different ways to approach this problem in Python, and I chose the one that’s most Notepad++ - friendly.

As for resources that one could use to get better at Python, three of my best friends when I was a noob were:

StackOverflow (obviously)
the Python standard library documentation (links to json and tutorial
Python for Everybody.

Ted Plum

@Mark-Olson, thanks, I’ll have a go with the script.

Edit: All right, you’ve answered my question in your edit.

Mark Olson

@Ted-Plum
See the resources that I linked in my most recent edit of the post.

I can’t promise that learning Python will make your life better. Initially I expect you will find it frustrating, but I get the impression that you will find plenty of opportunities to use your learnings before too long.

guy038

Hello, @ted-plum, @mark-olson and All,

Ah… OK, @ted-plum. The two files are quite similar in size and contents !

So, I downloaded your two files from Google Drive

The first one is called !item-names.json. It’s a UTF-8-BOM encoded file with Windows line-breaks ( \r\n )
The second one is called item-names.json. It’s a UTF-8 encoded file with Unix line-break ( \n )

In order to compare these two files easily :

I first normalized these two files to the usual UTF-8 encoding and to Windows line-breaks. So :
- I used the View > Encoding option for the !item-names.json file
- I used the Edit > EOL Conversion > Window (CRLF) for the item-names.json file
Secondly, I deleted the outer square brackets in the two files
Thirdly, I run the following regex S/R onto these two files
- SEARCH (?<!\},)(?<!\n)\r\n
- REPLACE \t

In order to get a single JSON record per line ( 24,576 occ. for !item-names.json and 25,424 occ. for item-names.json )

Then, comparing these two files, I noticed that :

Some records are different because field(s), other than enUS, are modified
Some records are different because field enUS is modified
Some records are different because, both, field enUS and field(s), other than enUS, are modified
Near the end of the !item-names.json file, 6 records are deleted ( from id = 27345 to id = 27350 )
Near the end of the item-names.json file, a 54 records are added ( from id = 61000 to id = 61051 + 61072 and 61073 )

Thus, @ted-plum :

Generally speaking, do we have to modify the first !item-names.json file, using data from the item-names.json file or the opposite ?
Do we have to take care about the enUS changes ONLY or do we have to take all the changes in account ?
Do we have to add/delete the new lines, as well ?

Best Regards

guy038

Ted Plum

@Mark-Olson, I’ve poked around with the script you’d written, but no luck so far. When trying to run it I keep getting errors such as:

No module named ‘Npp’

or, if I try running it without the “from Npp import *” line, I get:

name ‘notepad’ is not defined

Am I missing something?

Michael Vincent

@Ted-Plum said in Replace Lines from different files:

No module named ‘Npp’
or, if I try running it without the “from Npp import *” line, I get:
name ‘notepad’ is not defined

You are running this from the PythonScript plugin of Notepad++, yes?

Cheers.

Ted Plum

@guy038, the file item-names.json has to remain intact, encoded in UTF-8, except for the corresponding enUS values, which should be taken from the !item-names.json, no new lines should be added, and no lines that don’t exist in !item-names.json should be removed from item-names.json. The file with the exclamation mark is just a donor file, that has the right enUS values.

Ted Plum

@Michael-Vincent, yes, I’ve installed Python, installed the PythonScript v2 plugin into NPP, and for good measure tried the NppExec plugin. I open both of my JSON files in NPP, open a file with your script pasted into it, and run it through NPP.

Yes, I’ve also renamed the JSON files to match your script.

Alan Kilborn

@Ted-Plum said in Replace Lines from different files:

No module named ‘Npp’

yes, I’ve installed Python, installed PythonScript v2, and for good measure tried NppExec. I open both of my JSON files in NPP, open a file with your script pasted into it, and run it through NPP.

There’s no need to install Python as a standalone thing. If there is, then pursuing problems with that is off-topic to this forum.

Please make sure you’ve done due-diligence at reading, understanding and following the basic steps to Pythonscripting, found in the FAQ for this forum HERE.

and for good measure tried NppExec

Doing random things isn’t going to help.

Ted Plum

@Alan-Kilborn, thanks.

Ted Plum

@Michael-Vincent @Mark-Olson, never mind my two previous comments. I’ve managed to run Mark’s script, and here’s the result :)

https://drive.google.com/file/d/1b1yjTI027hss6ue-wJw6FnqyuRsecUiQ/view?usp=sharing

Mark Olson

@Ted-Plum
Glad you got it working!

One final meta-suggestion: if you have a JSON plugin and the ComparePlus plugin installed, you can easily compare two JSON files.
First, you want to pretty-print both JSON files (so that the same formatting rules are applied to both) and then you want to use ComparePlus to compare them.

I do this all the time, and it works quite well.

Alternatively

x = json.loads(json_string_1)
y = json.loads(json_string_2)
assert x == y

will check if json_string_1 and json_string_2 are equivalent JSON.

Michael Vincent

@Ted-Plum said in Replace Lines from different files:

your script

Just for clarification, it is @Mark-Olson 's script - I just use PythonScript a lot and an error:

No module named ‘Npp’

Implies a PythonScript is not being run from the PythonScript plugin or the plugin is not installed correctly.

Glad it worked out though!

Cheers.

Ted Plum

@Michael-Vincent Yeah, I’ve corrected my reply. My issue with the plugin was that I hadn’t been able to put Mark’s script into the User Scripts menu. I eventually just added it to the Machine Scripts, ran it, and it seems to have worked.

@Mark-Olson, will try all that. I’ve just been looking for a way to compare files in NPP too, thanks.

PeterJones

@Ted-Plum said in Replace Lines from different files:

My issue with the plugin was that I hadn’t been able to put Mark’s script into the User Scripts menu

Assuming a normal installation, which uses the %AppData%🛈 hierarchy for Notepad++ plugin settings, your user scripts need to be saved in the folder %AppData%\Notepad++\Plugins\Config\PythonScript\scripts

This is described in Plugins > PythonScript > Context-Help in the “Plugin Installation and Usage” page. It’s also described in the footnotes section of the FAQ that Alan already directed you to.

Ted Plum

@PeterJones Ah, the AppData. I confused that with the Config folder in the installation folder for the NPP. It’s working now :)

guy038

Hi, @ted-plum, @mark-olson, @michael-vincent, @alan-kilborn, @peterjones and All,

At last, here is my regexes’s solution !

First, for a reason that I did not understand totally, the encoding of the !item-names.json file must not be changed to UTF-8. Indeed :

Open the initial UTF-8-BOM file !item-names.json in N++
Change its encoding with the Encoding > Convert to UTF-8 option and save it
Close the !item-names.json file
Re-open the !item-names.json file in N++ => On the status bar, the encoding is changed to ANSI and values in the file are modified ?!

Thus, I decided to, temporarily, normalize these two files to the UTF-8-BOM encoding, with Windows line-break ( \r\n )

In these two files, I deleted the outer square brackets

Then, I ran the following regex S/R, onto these two files :
- SEARCH (?<!\},)(?<!\n)\r\n
- REPLACE \t

In order to get a single JSON record per line. So, it remains :

1,589 lines in the item-names.json file
1,536 lines in the !item-names.json file

Then, I created a new UTF-8-BOM file All.json where I pasted, first, the contents of the item-names.json file, then the contents of the !item-names.json file, giving a file of 3,125 lines

In a quick glance, I noticed 3 lines were not aligned with all the others ( easy to get them with the combination MARK ^.{9}id and the Inverse Bookmark option )

After corrections, I ran these regex S/R :

SEARCH (?x) (?<= "id" : \x20 ) \d{4} ,
REPLACE 0$0

In order to normalize all the digits, of the id string, to five digits

Now, in order to differentiate the item-names.json and the !item-names.json contents, I added, using the column-mode selection :

A last digit 0 to the first 1,589 lines of the All.json file
A last digit 9 to the last 1,536 lines of the All.json file

Giving, for instance, at the junction :

  {	    "id": 610720,	    "Key": "Dyedesc",	    "enUS": "Add this color to your equipment",	    "zhTW": "Add this color to your equipment",	    "deDE": "Add this color to your equipment",	    "esES": "Add this color to your equipment",	    "frFR": "Add this color to your equipment",	    "itIT": "Add this color to your equipment",	    "koKR": "Add this color to your equipment",	    "plPL": "Add this color to your equipment",	    "esMX": "Add this color to your equipment",	    "jaJP": "Add this color to your equipment",	    "ptBR": "Add this color to your equipment",	    "ruRU": "Add this color to your equipment",	    "zhCN": "Add this color to your equipment"	  },
  {	    "id": 610730,	    "Key": "Dyerdesc",	    "enUS": "Remove this color on your equipment",	    "zhTW": "Remove this color on your equipment",	    "deDE": "Remove this color on your equipment",	    "esES": "Remove this color on your equipment",	    "frFR": "Remove this color on your equipment",	    "itIT": "Remove this color on your equipment",	    "koKR": "Remove this color on your equipment",	    "plPL": "Remove this color on your equipment",	    "esMX": "Remove this color on your equipment",	    "jaJP": "Remove this color on your equipment",	    "ptBR": "Remove this color on your equipment",	    "ruRU": "Remove this color on your equipment",	    "zhCN": "Remove this color on your equipment"	  }
  {	    "id": 010609,	    "Key": "qf1",	    "enUS": "Khalim's Flail",	    "zhTW": "克林姆的連枷",	    "deDE": "Khalims Kultflegel",	    "esES": "Rompecabezas de Khalim",	    "frFR": "Fléau de Khalim",	    "itIT": "Flagello di Khalim",	    "koKR": "칼림의 도리깨",	    "plPL": "Korbacz Khalima",	    "esMX": "Mangual de Khalim",	    "jaJP": "カリムのフレイル",	    "ptBR": "Mangual de Khalim",	    "ruRU": "Кистень Халима",	    "zhCN": "卡林姆的连枷"	  },
  {	    "id": 010619,	    "Key": "qf2",	    "enUS": "Khalim's Will",	    "zhTW": "克林姆的遺願",	    "deDE": "Khalims Wille",	    "esES": "Voluntad de Khalim",	    "frFR": "Volonté de Khalim",	    "itIT": "Volontà di Khalim",	    "koKR": "칼림의 의지",	    "plPL": "Wola Khalima",	    "esMX": "Voluntad de Khalim",	    "jaJP": "カリムの意志",	    "ptBR": "Vontade de Khalim",	    "ruRU": "Воля Халима",	    "zhCN": "卡林姆的意志"	  },
  {	    "id": 010629,	    "Key": "KhalimFlail",	    "enUS": "Khalim's Flail",	    "zhTW": "克林姆的連枷",	    "deDE": "Khalims Kultflegel",	    "esES": "Rompecabezas de Khalim",	    "frFR": "Fléau de Khalim",	    "itIT": "Flagello di Khalim",	    "koKR": "칼림의 도리깨",	    "plPL": "Korbacz Khalima",	    "esMX": "Mangual de Khalim",	    "jaJP": "カリムのフレイル",	    "ptBR": "Mangual de Khalim",	    "ruRU": "Кистень Халима",	    "zhCN": "卡林姆的连枷"	  },
  {	    "id": 010639,	    "Key": "SuperKhalimFlail",	    "enUS": "Khalim's Will",	    "zhTW": "克林姆的遺願",	    "deDE": "Khalims Wille",	    "esES": "Voluntad de Khalim",	    "frFR": "Volonté de Khalim",	    "itIT": "Volontà di Khalim",	    "koKR": "칼림의 의지",	    "plPL": "Wola Khalima",	    "esMX": "Voluntad de Khalim",	    "jaJP": "カリムの意志",	    "ptBR": "Vontade de Khalim",	    "ruRU": "Воля Халима",	    "zhCN": "卡林姆的意志"	  },

Then I used the Edit > Line Operations > Sort Lines Lexicographically Ascending option to sort the contents of the All.json file

After the sort operation, the beginning of the all.json file becomes as below :

  {	    "id": 010600,	    "Key": "qf1",	    "enUS": "Khalim's Flail",	    "zhTW": "克林姆的連枷",	    "deDE": "Khalims Kultflegel",	    "esES": "Rompecabezas de Khalim",	    "frFR": "Fléau de Khalim",	    "itIT": "Flagello di Khalim",	    "koKR": "칼림의 도리깨",	    "plPL": "Korbacz Khalima",	    "esMX": "Mangual de Khalim",	    "jaJP": "カリムのフレイル",	    "ptBR": "Mangual de Khalim",	    "ruRU": "Кистень Халима",	    "zhCN": "卡林姆的连枷"	  },
  {	    "id": 010609,	    "Key": "qf1",	    "enUS": "Khalim's Flail",	    "zhTW": "克林姆的連枷",	    "deDE": "Khalims Kultflegel",	    "esES": "Rompecabezas de Khalim",	    "frFR": "Fléau de Khalim",	    "itIT": "Flagello di Khalim",	    "koKR": "칼림의 도리깨",	    "plPL": "Korbacz Khalima",	    "esMX": "Mangual de Khalim",	    "jaJP": "カリムのフレイル",	    "ptBR": "Mangual de Khalim",	    "ruRU": "Кистень Халима",	    "zhCN": "卡林姆的连枷"	  },
  {	    "id": 010610,	    "Key": "qf2",	    "enUS": "Khalim's Will",	    "zhTW": "克林姆的遺願",	    "deDE": "Khalims Wille",	    "esES": "Voluntad de Khalim",	    "frFR": "Volonté de Khalim",	    "itIT": "Volontà di Khalim",	    "koKR": "칼림의 의지",	    "plPL": "Wola Khalima",	    "esMX": "Voluntad de Khalim",	    "jaJP": "カリムの意志",	    "ptBR": "Vontade de Khalim",	    "ruRU": "Воля Халима",	    "zhCN": "卡林姆的意志"	  },
  {	    "id": 010619,	    "Key": "qf2",	    "enUS": "Khalim's Will",	    "zhTW": "克林姆的遺願",	    "deDE": "Khalims Wille",	    "esES": "Voluntad de Khalim",	    "frFR": "Volonté de Khalim",	    "itIT": "Volontà di Khalim",	    "koKR": "칼림의 의지",	    "plPL": "Wola Khalima",	    "esMX": "Voluntad de Khalim",	    "jaJP": "カリムの意志",	    "ptBR": "Vontade de Khalim",	    "ruRU": "Воля Халима",	    "zhCN": "卡林姆的意志"	  },
  {	    "id": 010620,	    "Key": "KhalimFlail",	    "enUS": "Khalim's Flail",	    "zhTW": "克林姆的連枷",	    "deDE": "Khalims Kultflegel",	    "esES": "Rompecabezas de Khalim",	    "frFR": "Fléau de Khalim",	    "itIT": "Flagello di Khalim",	    "koKR": "칼림의 도리깨",	    "plPL": "Korbacz Khalima",	    "esMX": "Mangual de Khalim",	    "jaJP": "カリムのフレイル",	    "ptBR": "Mangual de Khalim",	    "ruRU": "Кистень Халима",	    "zhCN": "卡林姆的连枷"	  },
  {	    "id": 010629,	    "Key": "KhalimFlail",	    "enUS": "Khalim's Flail",	    "zhTW": "克林姆的連枷",	    "deDE": "Khalims Kultflegel",	    "esES": "Rompecabezas de Khalim",	    "frFR": "Fléau de Khalim",	    "itIT": "Flagello di Khalim",	    "koKR": "칼림의 도리깨",	    "plPL": "Korbacz Khalima",	    "esMX": "Mangual de Khalim",	    "jaJP": "カリムのフレイル",	    "ptBR": "Mangual de Khalim",	    "ruRU": "Кистень Халима",	    "zhCN": "卡林姆的连枷"	  },

Now, with the following regex S/R, we change two complete consecutive lines, with a same first 5 digits of the id value and a different enUS value with the first line ONLY with a modified enUS value, taken from the corresponding enUS value in the second line :

SEARCH (?x-s) ^ ( .{19} ) 0 ( .+? "enUS" : \x20 " ) ( [^"\r\n]+? ) ( " .+ ) \R \1 9 .+? "enUS" : \x20 " (?! \3 " ) ( [^"\r\n]+? ) " .+ \R

REPLACE ${1}0\2\5\4\r\n

=> We get 61 replacements

Then, we simply get rid of all the remaining lines of the !item-names.json with that regex S/R :

SEARCH (?x-s) ^ ( .{19} ) 9 .+ \R

REPLACE Leave EMPTY

Afer replacement, it should remain 1,589 lines in the All.json file ( so the same number of lines than in the item-names.json file )

We attack the final part of our goal !. We have to get back to the normal layout of a json file. So :

We delete the 6th digit of the id string :
- SEARCH (?x) ^ .{19} \K \d
- REPLACE Leave EMPTY
We delete the initial 0 digit of the id string
- SEARCH (?x) ^ .{14} \K 0
- REPLACE Leave EMPTY
We replace any \t character with a normal line-break ( Wait a bit ! )
- SEARCH \t
- REPLACE \r\n
We add the outer square brackets lines at the very beginning and the very end of the All.json file
Finally, we get back to an Unix file with the Edit > EOL conversion > Unix (LF) option

So, @ted-plum, the All.json file now represents your expected data, which replaces your initial item-names.json file !

Of course, using the @mark-olson’s Python script is certainly the quicker and best solution, but as you can see, a native N++ solution is also possible !

Best Regards,

guy038