How to find copy and replace "quoted text" from one text file to another

mkupper

I think @DankiestCitra is asking for something that is normally can only be handled by scripting.

I did not think Notepad++ has a way to extract fields from one file and to drop them into fields in another file.

I know someone could do a mark-all and then Copy-Marked-Text to extract information from a file but Notepad++ does not offer a way to paste or merge information into a file while at the same time doing lookups.

For example, in text1.json we essentially have

...
"mod_ak101_magazine_standard" : { "name" : "AK-101 30-round polymer magazine" },
"mod_ak101_magazine_tactical" :  { "name" : "AK-101 Circle 10 30-round magazine" },
...

The OP wants to scan text1.json and for each node, such as “mod_ak101_magazine_standard” to get the value of the “name” field, and then to scan text2.json, to look up “mod_ak101_magazine_standard” within that file and to overwrite its “name” field value.

The OP wants to use a JSON file like a database. JSON is a data interchange format, not a database. There may be tools out there that allow for manipulation of a JSON file as though it’s a database but I’m certain those tools employ something that looks like scripting so that users can define how the manipulation should be done.

DankiestCitra

mkupper you mentioned scripting, that would be ideal, but this wasnt what I was looking for, that is not the specific reason for making this post. I do realize that Notepad++ doesnt have such scripting features.
The Replace feature using regex was able to work by simply going through the “name” fields one by one, and it did it, but the issue is that my regex command was not done right.
Basically it does the copy, not from the other file, but from the same file when I give it the lines that I want to feed to the destination, but simply cannot replace the original text. I have to specify \d+ \w+ for the digits and words that the text has got, and when the text has only words or more digits than specified, it fails to replace it all or simply doesnt copy at all.

If you look at this I posted in the OP

(?-i)(?:(?<group1>(?-s).+)(?<group2>(?s).+?-{10}.+?"name" : )"\d+ \w+)|(?:\R+-{10}\R)

This part right here seems to be my issue: "\d+ \w+

If someone understands these regular expressions really well, then perhaps they’ll be able to help.
The above regex when used in Replace (with wrap around and regular expression selected), goes for 1 line after another, and tries to copy and replace text that is found right after “name” :

So it’s something like this:

1 Text to be copied
2 Text to be copied
3 Text to be copied
4 Text to be copied
----------
{
	"data" : {
		"ak_101" : {
			"basic" : {
				"categoria" : "weapon",
				"description" : "",
				"name" : "EC 101",
				"scrap" : "weapon",
				"sprite_ingame" : "s_mod_ak101_base_game",

It does actually work, but isn’t able to delete what is after “name” : but merely expects a certain number of digits and words, and if it doesn’t find what it expects, then it fails. If there are more words or numbers within the quotes it just leaves them in, which is of course unacceptable.

I hope this makes sense, I tried to explain as best as I can.

Terry R

@DankiestCitra said in How to find copy and replace "quoted text" from one text file to another:

I hope this makes sense, I tried to explain as best as I can.

I’ve taken a bit of interest in your problem, partly as I was involved in both of the topics you mentioned in your OP.

Now I’m not sure but it does seem as though you have elected to use the solution as proposed in topic #16287. The issue mainly seems as though you are unable to cook up a regex that will successfully move (copy) the required text across so you can complete the reminder of the steps.

If so, could you show an excerpt of the file/tab showing the “combined data”. From that it will be much easier for us to help you with a regex that will allow you to move forward.

As you have done before put the example/excerpt in the code block so data isn’t corrupted.

Terry

PS after yet another read, I’m now not so certain you followed the topic #16287, but maybe the other. The other topic can easily fail, especially when there is a large amount of date to process, which it does seem you have. I think #16287 would show more promise, even if it does seem to take longer (and more steps) to complete.

mkupper

@DankiestCitra, you can do it in one file at a time, not two.

Consider this:

...
x "mod_ak101_magazine_standard" : { "name" : "AK-101 30-round polymer magazine" },
x "mod_ak101_magazine_tactical" : { "name" : "AK-101 Circle 10 30-round magazine" },
...

"data" : {
	"mod_ak101_magazine_standard" : {
		"basic" : {
			"categoria" : "w_mod",
			"description" : "",
			"name" : "EC-101 standard magazine",
			"scrap" : "material",
			"sprite_ingame" : "s_mod_ak101_magazine_standard_game",
			"sprite_inv" : "s_mod_ak101_magazine_standard",
			"stack_max" : 1,
			"value" : 600,
			"weight" : 0.1
	},
	"mod_ak101_magazine_tactical" : {
		"basic" : {
			"categoria" : "w_mod",
			"description" : "",
			"name" : "EC-101 tactical magazine",
			"scrap" : "material",
			"sprite_ingame" : "s_mod_ak101_magazine_tactical_game",

Search: (?s-i)^x "([^"]+)" : \{ "name" : "([^"]+)" },\R(.+\R[ \t]+"\1" : \{.+?\R[ \t]+"name" : ")[^"]*
Replace: \3\2

What I did was to build a list of the things I need to search for and the new name values. Those are the lines I have at the top that start with the letter x. I copy/pasted that list into top of the second file.

The search part has:

(?s-i) - Allow dot to match end of line marks and to match the letter case so that “Name” is not the sane as “name”.
^x "([^"]+)" : \{ "name" : "([^"]+)" },\R - Find a line that starts with the ‘x’ keyword/tag and grab the keyword in \1 and the new value for the name field in \2.
(.+\R[ \t]+"\1" : \{.+?\R[ \t]+"name" : ") - This is \3 and is a scan of the entire file until I get to a line that starts with blanks/tabs "\1" : \{. The scanning continues, this time with a non-greedy .+? until we get to a line that starts with blanks/tabs "name" : "
The final [^"]* erases the old value of the name field.

The replacement is simply \3\2 with \3 being the entire file from just after that x line we found down to just after the leading " for the name field we are seeking. \2 drops the desired value in place.

You will need to rerun the replace all multiple times. Each time, it will process and remove one more of the x lines. In this sense you are running a script. I supposed you could run a macro that does the replace a few thousand times.

Mark Olson

I have devised a method for doing this pretty efficiently using JsonTools or another scripting plugin.

If the OP is content to do this using regular expressions, which I again advise against, I won’t stand in their way. But I am happy to share my solution if people care.

Terry R

@Mark-Olson said in How to find copy and replace "quoted text" from one text file to another:

But I am happy to share my solution if people care.

Well I for one would like to see your solution. When I read the OP I did think of that plugin but after a bit of reading on the plugin it didn’t suggest it would help. So I would like to know how.

Especially as this does seem to be one of the questions asked on a fairly regular occasion. Maybe if it is a polished solution it could be made into a FAQ post.

Terry

Mark Olson

I’m going to show the PythonScript approach, because while JsonTools can solve this specific problem, it requires running a couple more plugin commands at present.

I apologize that this is basically a Python tutorial covered in a thin veneer of Notepad++ - specific content. I would rather provide the most useful answer, even if its NPP-specificity is not so high. In any case, further questions about the Python involved in this script are best directed to a general-purpose programming forum.

'''
requires PythonScript v3 or higher: https://github.com/bruderstein/PythonScript/releases
docs: https://docs.python.org/3/library/json.html
ref: https://community.notepad-plus-plus.org/topic/25294/how-to-find-copy-and-replace-quoted-text-from-one-text-file-to-another/11?_=1704210088779
'''
import json
from Npp import *

# the file is divided up into valid JSON documents by lines of ----------
DOC_SEP = '\r\n' + '-'*10 + '\r\n'

json_strings = editor.getText().split(DOC_SEP)
# parse every json document
jsons = [json.loads(s) for s in json_strings]
# every name we care about is in the 'data' field of the first object
first_json = jsons[0]['data']
# the names we care about are in the 'name' field of the 'basic' field
# of each object under first_json
names = {product_name: obj['basic']['name']
    for product_name, obj in first_json.items()}

for ii in range(1, len(jsons)):
    # loop through all the other JSONs, setting the ['basic']['name'] field
    # to the name found for that same product in the first object
    other_json = jsons[ii]
    for product_name, obj in other_json.items():
        obj['basic']['name'] = names[product_name]
        
# now all the JSONs have the same (product name)-name correspondences.
# we just dump all the JSON back in the same format as it was originally in
# with ---------- separating the documents

new_json_strings = [json.dumps(obj, indent=4) for obj in jsons]
editor.setText(DOC_SEP.join(new_json_strings))

DankiestCitra

Terry-R said

I’ve taken a bit of interest in your problem, partly as I was involved in both of the topics you mentioned in your OP.

Thanks Terry, I appreciate it. You really do understand what my issue is and what I am trying to do.

I have indeed used #16287 in my poorly implemented solution, you were correct the first time.
#16246 was not so helpful, but it still lead me towards the whole using regex Replace to do this, so it helped.

Specifically, I am using Scott Sumner’s solution, (/topic/16287/replace-certain-text-in-lines-with-each-line-another-file/8?_=1704215793400)
where they even illustrate the steps the Replace goes through using yellow markers.
I tried some of the others, but this was the only one I was able to understand and implement myself.

I will explain what I do step by step, all the way to the issue I am facing (sorry if it is long).

On the source file I do Ctrl+M and I input on Mark > Find what: “name”. Bookmark Line, Match Case, Wrap Around, Normal, are all selected (enabled).
Search > Bookmark > Copy bookmarked lines
On the destination file, I go to the very top, and Paste the copied lines. At the last one, I create an empty line, and then type 10 - (like this ----------)
I then remove all the “name” : parts from those newly pasted lines.
The result now looks something like the following (I cropped it heavily, for the sake of keeping this short. I will not take up tons of forum space unnecessarily):

"AK-101 30-round polymer magazine",
"AK-101 Circle 10 30-round magazine",
"AK series standard barrel",
"AK series long barrel",
----------
{
	"data" : {
		"mod_ak101_magazine_standard" : {
			"basic" : {
				"categoria" : "w_mod",
				"description" : "",
				"name" : "EC-101 standard magazine",
				"scrap" : "material",
				"sprite_ingame" : "s_mod_ak101_magazine_standard_game",
		},
		"mod_ak101_magazine_tactical" : {
			"basic" : {
				"categoria" : "w_mod",
				"description" : "",
				"name" : "EC-101 tactical magazine",
				"scrap" : "material",
				"sprite_ingame" : "s_mod_ak101_magazine_tactical_game",
		},
		"mod_ak74_barrel_1" : {
			"basic" : {
				"categoria" : "w_mod",
				"description" : "",
				"name" : "EC series short barrel",
				"scrap" : "material",
				"sprite_ingame" : "s_mod_ak74_barrel_1_game",
		},
		"mod_ak74_barrel_2" : {
			"basic" : {
				"categoria" : "w_mod",
				"description" : "",
				"name" : "EC series long barrel",
				"scrap" : "material",

A very very simplified version which I removed all unnecessary lines just for clarity:

"AK-101 30-round polymer magazine",
"AK-101 Circle 10 30-round magazine",
"AK series standard barrel",
"AK series long barrel",
"AK FAB Defense AG-43 pistol grip (black)",
"AK FAB Defence AG-43 pistol grip (Flat Dark Earth)",
"AK standard wood pistol grip",
"AK standard polymer pistol grip",
"AK Magpul MOE M-LOK handguard (black)",
"AK Magpul MOE M-LOK handguard (Flat Dark Earth)",
----------
				"name" : "EC-101 standard magazine",
				"name" : "EC-101 tactical magazine",
				"name" : "EC series short barrel",
				"name" : "EC series long barrel",
				"name" : "EC Tacto SAVV Pistol grip",
				"name" : "EC Tacto SAVV Pistol grip desert",
				"name" : "EC standard pistol grip",
				"name" : "EC standard pistol grip",
				"name" : "EC handguard",
				"name" : "EC desert handguard",

And now for the hard part:
Based on Scott Sumner’s solution,
I use this modified regex on Find what: (?-i)(?:(?<group1>(?-s).+)(?<group2>(?s).+?-{10}.+?"name" : )"\d+ \w+)|(?:\R+-{10}\R)
And on Replace with: $+{group2}$+{group1}
Wrap around and Regular expression are both enabled (checked).

However, this doesnt actually work. 0 occurences were replaced in the entire file. It did go through the file though, and the regex is tested and works correctly (with debugger).

Now here’s the lame part: Going to the first “name” field and removing a couple of letters from it seems to do the copy successfully.
So I change "name" : "EC-101 standard magazine", to "name" : "101 standard magazine",
And now the regex actually performs 1 occurence replacement when Replace All is pressed.
So now the result is this:

"name" : "AK-101 30-round polymer magazine", magazine",

The above is ofc undesirable, as it left the magazine", part in. You can see what it replaced and what it did not.

To cut a long story short, this regex was designed for and expects something like:

100 cups
080 bowls
200 John

So when it finds letters/words at the beginning, it cannot perform a replace.
The actual issue as I said earlier, is that pesky \d+ \w+ part of the regex, which orders it to look for digits and then words. IF it doesnt find them in that order, then it doesnt do a Replace operation.

I need a regex which tells it to replace anything it finds within the quotes. So that even if there is something like
“name” : “df46%y7qdg fyy-554g 546dd-fwh54t y57wdt4”
it should still delete the part in the quotation marks after “name” and paste the lines I feed it from the very top.

Or better yet, do a complete replacement of the entire line which has “name” in it, and substitute it with my own from the top.
For clarity, when I say “top” I am referring to this:
“AK-101 30-round polymer magazine”,
“AK-101 Circle 10 30-round magazine”,
“AK series standard barrel”,
And these lines were stripped from the “name” : part manually by me.

P.S. I am not a programmer I dont really know how to code, and I am not really trying to be one. Pls refrain from criticizing this as if its professional work; it is not.
P.P.S. I am sorry for taking up so much forum space, but I wanted to make it clear and respond to Terry R’s request to show the combined data. Hopefully I did so successfully.

DankiestCitra

mkupper
Thanks, this actually worked like a charm, but there is the slightly problem of having to manually build this part at the top:

x "mod_ak101_magazine_standard" : { "name" : "AK-101 30-round polymer magazine" },
x "mod_ak101_magazine_tactical" : { "name" : "AK-101 Circle 10 30-round magazine" },

I am able to strip the first part with the x and put it in the same text file, but the second part, I dont know how to put these after every line.
So what I can only have is this

x "mod_ak101_magazine_standard" : {
x "mod_ak101_magazine_tactical" : {
x "mod_ak74_barrel_1" : {
x "mod_ak74_barrel_2" : {
x "mod_ak74_grip_2_b" : {
x "mod_ak74_grip_2_y" : {
x "mod_ak74_grip_s_2" : {
x "mod_ak74_grip_s_3" : {
x "mod_ak74_handguard_2_b" : {
x "mod_ak74_handguard_2_y" : {
----
				"name" : "AK-101 30-round polymer magazine",
				"name" : "AK-101 Circle 10 30-round magazine",
				"name" : "AK series standard barrel",
				"name" : "AK series long barrel",
				"name" : "AK FAB Defense AG-43 pistol grip (black)",
				"name" : "AK FAB Defence AG-43 pistol grip (Flat Dark Earth)",
				"name" : "AK standard wood pistol grip",
				"name" : "AK standard polymer pistol grip",
				"name" : "AK Magpul MOE M-LOK handguard (black)",
				"name" : "AK Magpul MOE M-LOK handguard (Flat Dark Earth)",

Of course I can copy/paste each line manually to put it next to the x "mod_ part, but if I am to do that, then I might as well do the actual intended operation manually anyway.
Your solution actually worked though, it copies each line successfully without issues.

It just didnt really save me any time it seems.

Terry R

@DankiestCitra said in How to find copy and replace "quoted text" from one text file to another:

However, this doesnt actually work. 0 occurences were replaced in the entire file. It did go through the file though, and the regex is tested and works correctly (with debugger).

I looked at the regex in step #5 you said was based on Scott’s solution. I can easily see why it doesn’t work and unless you know regex you would find it difficult to understand all of what it tries to do. I think you did alude to the problem in a previous post, namely \d+ \w+. This will never capture the existing Name field so a replacement can work.

I did a test based on your first example set (the second where extra lines are removed doesn’t help as sometimes the other lines DO matter when testing). In the end I came up with the following version of the Find What field:
(?-i)(?:(?<group1>(?-s).+)(?<group2>(?s).+?-{10}.+?"name" : )"[^"]+")|(?:\R+-{10}\R)

Now this does select the name field. I’m not sure if you understand what is actually happening but it will only replace one set for each push of the Replace All button. That means you will need to run this multiple of times until the whole file is changed.

I wonder if you have considered my solution here. The benefit here is although it takes a bit more effort to extract (and massage) the data from both files, once in a new tab the regex will only need to run once to complete all the changes. Then the data is re-inserted back into the original file, and re-sorted and then the line numbers are removed. If you aren’t sure of my solution I am willing to explain it again, this time more in line with your need.

Terry

DankiestCitra

Terry-R
Thanks for the response Terry.
Ok I tried your proposed regex, it does work, does replace the field inside the quotes successfully, but the command actually replaces the same “name” : entry every time I press Replace All.
I think it basically keeps running from top to bottom every time, and since it first runs across the topmost “name” field, it just keeps replacing that one.

All the other “name” fields are untouched. Derp.
At first I was happy because I thought “this is perfect, finally something that works great”, but…nah, lol.

I also tried the solution you linked me to, in thread #16287.
I tried to follow all the steps to the letter, but on step #9 I run into an issue: the regex command. What else…lol.
I replaced “contents” in that command with “name” because obviously it needs to fit my usage, but the command doesnt work for some reason.
This is the slightly altered regex command I used (Find what): \d+\h(\d+\h)(.+?"name" : "").+?(\).+?\R)\d+\h(.+)\R
Debuggex is telling me that it is valid, no errors found.
So I am stuck at step #9, with all the original “name” : fields and their intended replacements right below each one, they are all numbered, and tbh this solution seems overly complicated anyway.

Is there any way you can adapt your regex here to go to the next “name” field after replacing the first one?

(?-i)(?:(?<group1>(?-s).+)(?<group2>(?s).+?-{10}.+?"name" : )"[^"]+")|(?:\R+-{10}\R)

EDIT: Oh btw I wanted to say, this command I have used and is in the OP, actually goes to the next “name” field, one after another:
(?-i)(?:(?<group1>(?-s).+)(?<group2>(?s).+?-{10}.+?"name" : )"\d+ \w+)|(?:\R+-{10}\R)

But each “name” field needs to have a number and then a word.

So your adapted version did fix that issue, but now it doesn’t go to the next “name” field, so it introduced another issue. What a nightmare lol.

(This whole thing is ending up being more trouble than its worth really…)

Terry R

@DankiestCitra said in How to find copy and replace "quoted text" from one text file to another:

So your adapted version did fix that issue, but now it doesn’t go to the next “name” field,

Yes, sorry about that. I was really only concerned with the \d+ \w+ portion and my modified version sorted that. What I didn’t realise was that the original solution relied on the old text (to be replaced) was different in structure to the new text (to be inserted). Thus there would need to be more alterations to get it to work. And since the original solution post DID say it needs to run multiple times I think in your case it would be tedious to continue holding the play button down until 0 occurrences completed shows up.

So onto my proposed solution. So in order to get an appropriate regex for step #9 we’d need to see an example of the combined data. So again put the example in a code block. This should comprise sets of 2 lines, with the odd numbered line being from one file and the even numbered line of each set being from the other file. We need to know if odd line is providing the new text or is it the old text line.

Sometimes it does take a bit of effort to get to a good solution, however you really only have a small number of choices. Either learn python (or some other programming language), go with a regex solution (I know @Mark-Olson is cringing at this point since he firmly believes JSON files aren’t to be messed with by regex), or do it manually. Since manually would seem very cumbersome you are left with 2 choices. Sometimes it takes time to find a good process. The idea is, that once it is found it can be performed time and time again without issue.

At the end of the day only you can judge how well the process works. I hope you do understand that it will require you to spot check the results, until such time as you feel safe in the knowledge it won’t harm the data. Solutions such as this are only as good as the examples provided and we make no promises, but then you aren’t paying for the solution anyways.

Terry

DankiestCitra

@Terry-R said in How to find copy and replace "quoted text" from one text file to another:

And since the original solution post DID say it needs to run multiple times I think in your case it would be tedious to continue holding the play button down until 0 occurrences completed shows up.

I do need to keep pressing Replace All to do all entries, but a macro being run a few thousand times should fix that, I already tested this after all. It is not tedious, since I only need to record pressing Replace All once as a macro, and then run that macro a few hundred times, the feature is already in Notepad++.

Anyway, I have an idea:
If I had a regex command that looks for every "name" : field in the file, and replaces "name" : "insert text here" with "name" : "123" then the original regex I used in the OP will work, since the \d+ in it definitely works.

Nvm, you already gave me the answer there “[^”]+"
I did the following:
Find what: "name" : "[^"]+"
Replace with: "name" : "123"
then I did Find what:
(?-i)(?:(?<group1>(?-s).+)(?<group2>(?s).+?-{10}.+?"name" : )"\d+)|(?:\R+-{10}\R)
and replace with $+{group2}$+{group1}

Finally, Run the recorded macro a few hundred times.
It took about 5 minutes to finish, give or take, but it did it.

This actually wasnt so bad.

Terry R

@DankiestCitra said in How to find copy and replace "quoted text" from one text file to another:

“name” : “EC-101 standard magazine”,
“name” : “EC series short barrel”,

Well to supply you with what you want here it is:
Find What:(?-s)("name" : ")[^"\r\n]+"
Replace With:${1}1234"

It does seem from your many posts in this thread that you are finding the regex very difficult to understand. At some point you do need to start the process of understanding. I don’t mean the one that Scott provided, that is a bit more complex, but this one above should be something you could master very quickly. As a suggestion, there is a FAQ post here that’s a very good starting point. Using the regex101.com site to describe a particular regex will definitely help. Bear in mind that it’s test doesn’t always work on a valid regex for Notepad++, since they use different flavours of regular expression engines, but most of the time it will.

Good luck (with the thousands of runs to complete)
Terry

PS, just about to post when I see you got it all on your own, well actually you realised I’d already provided the important bit.

DankiestCitra

Terry-R
Do you know how I can merge 2 lines together, i.e. step #9 in your solution in the other thread?
I want to try this method now, see if its actually better in practice.
And better yet, combine your solution with mkupper’s solution, and get the ideal one for my use.

Terry R

@DankiestCitra
As I said in a previous post, you need to provide a small set of examples of the combined data. From that I can cook up a suitable regex.

Terry

DankiestCitra

Terry-R
Alright Terry, here it is:

001 "name" : "EC-101 standard magazine",
002 "name" : "AK-101 30-round polymer magazine",
003 "name" : "EC-101 tactical magazine",
004 "name" : "AK-101 Circle 10 30-round magazine",
005 "name" : "EC series short barrel",
006 "name" : "AK series standard barrel",
007 "name" : "EC series long barrel",
008 "name" : "AK series long barrel",
009 "name" : "EC Tacto SAVV Pistol grip",
010 "name" : "AK FAB Defense AG-43 pistol grip (black)",
011 "name" : "EC Tacto SAVV Pistol grip desert",
012 "name" : "AK FAB Defence AG-43 pistol grip (Flat Dark Earth)",
013 "name" : "EC standard pistol grip",
014 "name" : "AK standard wood pistol grip",
015 "name" : "EC standard pistol grip",
016 "name" : "AK standard polymer pistol grip",

This is step #8, I am trying to get to step #9, which would merge the 2nd line of each pair into position in the first line.
This is tab #3 btw.

Terry R

@DankiestCitra
Thanks for the examples.

This regex should work (provided data copied perfectly into code block)

Find What:(?-s)^(\d+).+\R\d+(.+\R)
Replace With:${1}${2}

Just so you know what it is doing. It keeps (stores) the first line “line number” as that will be used to place it back into the correct position later on. It then reads over the remainder of that line and the number at the start of the second line (of each pair). Then it reads the remainder of the second line, storing that.
The replace field writes the 2 stored values back.

Hope that helps.
Terry

PS forgot to add, make sure the last line contains nothing, so in effect add a blank line. this is necessary as the regex expects a CR and/or LF after EVERY line.

DankiestCitra

@Terry-R
But this simply deleted the original ones. What is the point of that?

Terry R

@DankiestCitra said in How to find copy and replace "quoted text" from one text file to another:

But this simply deleted the original ones

Nope, you said move merge the 2nd line of each pair into position in the first line. That’s what the regex does. So you will be left with the original line number together with the new data. Isn’t that what you want? Then that tab#3 data can be reinserted into the original file, resorted and the line numbering removed.

Terry