How to find copy and replace "quoted text" from one text file to another
-
I have devised a method for doing this pretty efficiently using JsonTools or another scripting plugin.
If the OP is content to do this using regular expressions, which I again advise against, I won’t stand in their way. But I am happy to share my solution if people care.
-
@Mark-Olson said in How to find copy and replace "quoted text" from one text file to another:
But I am happy to share my solution if people care.
Well I for one would like to see your solution. When I read the OP I did think of that plugin but after a bit of reading on the plugin it didn’t suggest it would help. So I would like to know how.
Especially as this does seem to be one of the questions asked on a fairly regular occasion. Maybe if it is a polished solution it could be made into a FAQ post.
Terry
-
I’m going to show the PythonScript approach, because while JsonTools can solve this specific problem, it requires running a couple more plugin commands at present.
I apologize that this is basically a Python tutorial covered in a thin veneer of Notepad++ - specific content. I would rather provide the most useful answer, even if its NPP-specificity is not so high. In any case, further questions about the Python involved in this script are best directed to a general-purpose programming forum.
''' requires PythonScript v3 or higher: https://github.com/bruderstein/PythonScript/releases docs: https://docs.python.org/3/library/json.html ref: https://community.notepad-plus-plus.org/topic/25294/how-to-find-copy-and-replace-quoted-text-from-one-text-file-to-another/11?_=1704210088779 ''' import json from Npp import * # the file is divided up into valid JSON documents by lines of ---------- DOC_SEP = '\r\n' + '-'*10 + '\r\n' json_strings = editor.getText().split(DOC_SEP) # parse every json document jsons = [json.loads(s) for s in json_strings] # every name we care about is in the 'data' field of the first object first_json = jsons[0]['data'] # the names we care about are in the 'name' field of the 'basic' field # of each object under first_json names = {product_name: obj['basic']['name'] for product_name, obj in first_json.items()} for ii in range(1, len(jsons)): # loop through all the other JSONs, setting the ['basic']['name'] field # to the name found for that same product in the first object other_json = jsons[ii] for product_name, obj in other_json.items(): obj['basic']['name'] = names[product_name] # now all the JSONs have the same (product name)-name correspondences. # we just dump all the JSON back in the same format as it was originally in # with ---------- separating the documents new_json_strings = [json.dumps(obj, indent=4) for obj in jsons] editor.setText(DOC_SEP.join(new_json_strings))
-
-
Terry-R said
I’ve taken a bit of interest in your problem, partly as I was involved in both of the topics you mentioned in your OP.
Thanks Terry, I appreciate it. You really do understand what my issue is and what I am trying to do.
I have indeed used #16287 in my poorly implemented solution, you were correct the first time.
#16246 was not so helpful, but it still lead me towards the whole using regex Replace to do this, so it helped.Specifically, I am using Scott Sumner’s solution, (/topic/16287/replace-certain-text-in-lines-with-each-line-another-file/8?_=1704215793400)
where they even illustrate the steps the Replace goes through using yellow markers.
I tried some of the others, but this was the only one I was able to understand and implement myself.I will explain what I do step by step, all the way to the issue I am facing (sorry if it is long).
- On the source file I do Ctrl+M and I input on Mark > Find what: “name”. Bookmark Line, Match Case, Wrap Around, Normal, are all selected (enabled).
- Search > Bookmark > Copy bookmarked lines
- On the destination file, I go to the very top, and Paste the copied lines. At the last one, I create an empty line, and then type 10 - (like this ----------)
I then remove all the “name” : parts from those newly pasted lines. - The result now looks something like the following (I cropped it heavily, for the sake of keeping this short. I will not take up tons of forum space unnecessarily):
"AK-101 30-round polymer magazine", "AK-101 Circle 10 30-round magazine", "AK series standard barrel", "AK series long barrel", ---------- { "data" : { "mod_ak101_magazine_standard" : { "basic" : { "categoria" : "w_mod", "description" : "", "name" : "EC-101 standard magazine", "scrap" : "material", "sprite_ingame" : "s_mod_ak101_magazine_standard_game", }, "mod_ak101_magazine_tactical" : { "basic" : { "categoria" : "w_mod", "description" : "", "name" : "EC-101 tactical magazine", "scrap" : "material", "sprite_ingame" : "s_mod_ak101_magazine_tactical_game", }, "mod_ak74_barrel_1" : { "basic" : { "categoria" : "w_mod", "description" : "", "name" : "EC series short barrel", "scrap" : "material", "sprite_ingame" : "s_mod_ak74_barrel_1_game", }, "mod_ak74_barrel_2" : { "basic" : { "categoria" : "w_mod", "description" : "", "name" : "EC series long barrel", "scrap" : "material",
A very very simplified version which I removed all unnecessary lines just for clarity:
"AK-101 30-round polymer magazine", "AK-101 Circle 10 30-round magazine", "AK series standard barrel", "AK series long barrel", "AK FAB Defense AG-43 pistol grip (black)", "AK FAB Defence AG-43 pistol grip (Flat Dark Earth)", "AK standard wood pistol grip", "AK standard polymer pistol grip", "AK Magpul MOE M-LOK handguard (black)", "AK Magpul MOE M-LOK handguard (Flat Dark Earth)", ---------- "name" : "EC-101 standard magazine", "name" : "EC-101 tactical magazine", "name" : "EC series short barrel", "name" : "EC series long barrel", "name" : "EC Tacto SAVV Pistol grip", "name" : "EC Tacto SAVV Pistol grip desert", "name" : "EC standard pistol grip", "name" : "EC standard pistol grip", "name" : "EC handguard", "name" : "EC desert handguard",
- And now for the hard part:
Based on Scott Sumner’s solution,
I use this modified regex on Find what:(?-i)(?:(?<group1>(?-s).+)(?<group2>(?s).+?-{10}.+?"name" : )"\d+ \w+)|(?:\R+-{10}\R)
And on Replace with:$+{group2}$+{group1}
Wrap around and Regular expression are both enabled (checked).
However, this doesnt actually work. 0 occurences were replaced in the entire file. It did go through the file though, and the regex is tested and works correctly (with debugger).
- Now here’s the lame part: Going to the first “name” field and removing a couple of letters from it seems to do the copy successfully.
So I change"name" : "EC-101 standard magazine",
to"name" : "101 standard magazine",
And now the regex actually performs 1 occurence replacement when Replace All is pressed.
So now the result is this:
"name" : "AK-101 30-round polymer magazine", magazine",
The above is ofc undesirable, as it left the magazine", part in. You can see what it replaced and what it did not.
To cut a long story short, this regex was designed for and expects something like:
100 cups 080 bowls 200 John
So when it finds letters/words at the beginning, it cannot perform a replace.
The actual issue as I said earlier, is that pesky \d+ \w+ part of the regex, which orders it to look for digits and then words. IF it doesnt find them in that order, then it doesnt do a Replace operation.I need a regex which tells it to replace anything it finds within the quotes. So that even if there is something like
“name” : “df46%y7qdg fyy-554g 546dd-fwh54t y57wdt4”
it should still delete the part in the quotation marks after “name” and paste the lines I feed it from the very top.Or better yet, do a complete replacement of the entire line which has “name” in it, and substitute it with my own from the top.
For clarity, when I say “top” I am referring to this:
“AK-101 30-round polymer magazine”,
“AK-101 Circle 10 30-round magazine”,
“AK series standard barrel”,
And these lines were stripped from the “name” : part manually by me.P.S. I am not a programmer I dont really know how to code, and I am not really trying to be one. Pls refrain from criticizing this as if its professional work; it is not.
P.P.S. I am sorry for taking up so much forum space, but I wanted to make it clear and respond to Terry R’s request to show the combined data. Hopefully I did so successfully. -
mkupper
Thanks, this actually worked like a charm, but there is the slightly problem of having to manually build this part at the top:x "mod_ak101_magazine_standard" : { "name" : "AK-101 30-round polymer magazine" }, x "mod_ak101_magazine_tactical" : { "name" : "AK-101 Circle 10 30-round magazine" },
I am able to strip the first part with the x and put it in the same text file, but the second part, I dont know how to put these after every line.
So what I can only have is thisx "mod_ak101_magazine_standard" : { x "mod_ak101_magazine_tactical" : { x "mod_ak74_barrel_1" : { x "mod_ak74_barrel_2" : { x "mod_ak74_grip_2_b" : { x "mod_ak74_grip_2_y" : { x "mod_ak74_grip_s_2" : { x "mod_ak74_grip_s_3" : { x "mod_ak74_handguard_2_b" : { x "mod_ak74_handguard_2_y" : { ---- "name" : "AK-101 30-round polymer magazine", "name" : "AK-101 Circle 10 30-round magazine", "name" : "AK series standard barrel", "name" : "AK series long barrel", "name" : "AK FAB Defense AG-43 pistol grip (black)", "name" : "AK FAB Defence AG-43 pistol grip (Flat Dark Earth)", "name" : "AK standard wood pistol grip", "name" : "AK standard polymer pistol grip", "name" : "AK Magpul MOE M-LOK handguard (black)", "name" : "AK Magpul MOE M-LOK handguard (Flat Dark Earth)",
Of course I can copy/paste each line manually to put it next to the x "mod_ part, but if I am to do that, then I might as well do the actual intended operation manually anyway.
Your solution actually worked though, it copies each line successfully without issues.It just didnt really save me any time it seems.
-
@DankiestCitra said in How to find copy and replace "quoted text" from one text file to another:
However, this doesnt actually work. 0 occurences were replaced in the entire file. It did go through the file though, and the regex is tested and works correctly (with debugger).
I looked at the regex in step #5 you said was based on Scott’s solution. I can easily see why it doesn’t work and unless you know regex you would find it difficult to understand all of what it tries to do. I think you did alude to the problem in a previous post, namely
\d+ \w+
. This will never capture the existing Name field so a replacement can work.I did a test based on your first example set (the second where extra lines are removed doesn’t help as sometimes the other lines DO matter when testing). In the end I came up with the following version of the Find What field:
(?-i)(?:(?<group1>(?-s).+)(?<group2>(?s).+?-{10}.+?"name" : )"[^"]+")|(?:\R+-{10}\R)
Now this does select the name field. I’m not sure if you understand what is actually happening but it will only replace one set for each push of the Replace All button. That means you will need to run this multiple of times until the whole file is changed.
I wonder if you have considered my solution here. The benefit here is although it takes a bit more effort to extract (and massage) the data from both files, once in a new tab the regex will only need to run once to complete all the changes. Then the data is re-inserted back into the original file, and re-sorted and then the line numbers are removed. If you aren’t sure of my solution I am willing to explain it again, this time more in line with your need.
Terry
-
Terry-R
Thanks for the response Terry.
Ok I tried your proposed regex, it does work, does replace the field inside the quotes successfully, but the command actually replaces the same “name” : entry every time I press Replace All.
I think it basically keeps running from top to bottom every time, and since it first runs across the topmost “name” field, it just keeps replacing that one.All the other “name” fields are untouched. Derp.
At first I was happy because I thought “this is perfect, finally something that works great”, but…nah, lol.I also tried the solution you linked me to, in thread #16287.
I tried to follow all the steps to the letter, but on step #9 I run into an issue: the regex command. What else…lol.
I replaced “contents” in that command with “name” because obviously it needs to fit my usage, but the command doesnt work for some reason.
This is the slightly altered regex command I used (Find what):\d+\h(\d+\h)(.+?"name" : "").+?(\).+?\R)\d+\h(.+)\R
Debuggex is telling me that it is valid, no errors found.
So I am stuck at step #9, with all the original “name” : fields and their intended replacements right below each one, they are all numbered, and tbh this solution seems overly complicated anyway.Is there any way you can adapt your regex here to go to the next “name” field after replacing the first one?
(?-i)(?:(?<group1>(?-s).+)(?<group2>(?s).+?-{10}.+?"name" : )"[^"]+")|(?:\R+-{10}\R)
EDIT: Oh btw I wanted to say, this command I have used and is in the OP, actually goes to the next “name” field, one after another:
(?-i)(?:(?<group1>(?-s).+)(?<group2>(?s).+?-{10}.+?"name" : )"\d+ \w+)|(?:\R+-{10}\R)
But each “name” field needs to have a number and then a word.
So your adapted version did fix that issue, but now it doesn’t go to the next “name” field, so it introduced another issue. What a nightmare lol.
(This whole thing is ending up being more trouble than its worth really…)
-
@DankiestCitra said in How to find copy and replace "quoted text" from one text file to another:
So your adapted version did fix that issue, but now it doesn’t go to the next “name” field,
Yes, sorry about that. I was really only concerned with the
\d+ \w+
portion and my modified version sorted that. What I didn’t realise was that the original solution relied on the old text (to be replaced) was different in structure to the new text (to be inserted). Thus there would need to be more alterations to get it to work. And since the original solution post DID say it needs to run multiple times I think in your case it would be tedious to continue holding the play button down until 0 occurrences completed shows up.So onto my proposed solution. So in order to get an appropriate regex for step #9 we’d need to see an example of the combined data. So again put the example in a code block. This should comprise sets of 2 lines, with the odd numbered line being from one file and the even numbered line of each set being from the other file. We need to know if odd line is providing the new text or is it the old text line.
Sometimes it does take a bit of effort to get to a good solution, however you really only have a small number of choices. Either learn python (or some other programming language), go with a regex solution (I know @Mark-Olson is cringing at this point since he firmly believes JSON files aren’t to be messed with by regex), or do it manually. Since manually would seem very cumbersome you are left with 2 choices. Sometimes it takes time to find a good process. The idea is, that once it is found it can be performed time and time again without issue.
At the end of the day only you can judge how well the process works. I hope you do understand that it will require you to spot check the results, until such time as you feel safe in the knowledge it won’t harm the data. Solutions such as this are only as good as the examples provided and we make no promises, but then you aren’t paying for the solution anyways.
Terry
-
@Terry-R said in How to find copy and replace "quoted text" from one text file to another:
And since the original solution post DID say it needs to run multiple times I think in your case it would be tedious to continue holding the play button down until 0 occurrences completed shows up.
I do need to keep pressing Replace All to do all entries, but a macro being run a few thousand times should fix that, I already tested this after all. It is not tedious, since I only need to record pressing Replace All once as a macro, and then run that macro a few hundred times, the feature is already in Notepad++.
Anyway, I have an idea:
If I had a regex command that looks for every"name" :
field in the file, and replaces"name" : "insert text here"
with"name" : "123"
then the original regex I used in the OP will work, since the \d+ in it definitely works.Nvm, you already gave me the answer there “[^”]+"
I did the following:
Find what: "name" : "[^"]+"
Replace with: "name" : "123"
then I did Find what:
(?-i)(?:(?<group1>(?-s).+)(?<group2>(?s).+?-{10}.+?"name" : )"\d+)|(?:\R+-{10}\R)
and replace with$+{group2}$+{group1}
Finally, Run the recorded macro a few hundred times.
It took about 5 minutes to finish, give or take, but it did it.This actually wasnt so bad.
-
@DankiestCitra said in How to find copy and replace "quoted text" from one text file to another:
“name” : “EC-101 standard magazine”,
“name” : “EC series short barrel”,Well to supply you with what you want here it is:
Find What:(?-s)("name" : ")[^"\r\n]+"
Replace With:${1}1234"
It does seem from your many posts in this thread that you are finding the regex very difficult to understand. At some point you do need to start the process of understanding. I don’t mean the one that Scott provided, that is a bit more complex, but this one above should be something you could master very quickly. As a suggestion, there is a FAQ post here that’s a very good starting point. Using the regex101.com site to describe a particular regex will definitely help. Bear in mind that it’s test doesn’t always work on a valid regex for Notepad++, since they use different flavours of regular expression engines, but most of the time it will.
Good luck (with the thousands of runs to complete)
TerryPS, just about to post when I see you got it all on your own, well actually you realised I’d already provided the important bit.
-
Terry-R
Do you know how I can merge 2 lines together, i.e. step #9 in your solution in the other thread?
I want to try this method now, see if its actually better in practice.
And better yet, combine your solution with mkupper’s solution, and get the ideal one for my use. -
@DankiestCitra
As I said in a previous post, you need to provide a small set of examples of the combined data. From that I can cook up a suitable regex.Terry
-
Terry-R
Alright Terry, here it is:001 "name" : "EC-101 standard magazine", 002 "name" : "AK-101 30-round polymer magazine", 003 "name" : "EC-101 tactical magazine", 004 "name" : "AK-101 Circle 10 30-round magazine", 005 "name" : "EC series short barrel", 006 "name" : "AK series standard barrel", 007 "name" : "EC series long barrel", 008 "name" : "AK series long barrel", 009 "name" : "EC Tacto SAVV Pistol grip", 010 "name" : "AK FAB Defense AG-43 pistol grip (black)", 011 "name" : "EC Tacto SAVV Pistol grip desert", 012 "name" : "AK FAB Defence AG-43 pistol grip (Flat Dark Earth)", 013 "name" : "EC standard pistol grip", 014 "name" : "AK standard wood pistol grip", 015 "name" : "EC standard pistol grip", 016 "name" : "AK standard polymer pistol grip",
This is step #8, I am trying to get to step #9, which would merge the 2nd line of each pair into position in the first line.
This is tab #3 btw. -
@DankiestCitra
Thanks for the examples.This regex should work (provided data copied perfectly into code block)
Find What:
(?-s)^(\d+).+\R\d+(.+\R)
Replace With:${1}${2}
Just so you know what it is doing. It keeps (stores) the first line “line number” as that will be used to place it back into the correct position later on. It then reads over the remainder of that line and the number at the start of the second line (of each pair). Then it reads the remainder of the second line, storing that.
The replace field writes the 2 stored values back.Hope that helps.
TerryPS forgot to add, make sure the last line contains nothing, so in effect add a blank line. this is necessary as the regex expects a CR and/or LF after EVERY line.
-
@Terry-R
But this simply deleted the original ones. What is the point of that? -
@DankiestCitra said in How to find copy and replace "quoted text" from one text file to another:
But this simply deleted the original ones
Nope, you said move merge the 2nd line of each pair into position in the first line. That’s what the regex does. So you will be left with the original line number together with the new data. Isn’t that what you want? Then that tab#3 data can be reinserted into the original file, resorted and the line numbering removed.
Terry
-
@Terry-R
I’m sorry, I know you are probably annoyed a lot by me right now, but this method has confused the hell out of me.
Your regex really did delete every other line, honestly. I am sure I didnt mess it up, its just a copy/paste.I did some digging, found a different regex, and now I use this modified version of it:
Find: (?-s)^(.+)\R(.+)\R
Replace: \1\2\r\nThis successfully merged 2 lines together, every other line.
I will actually go for mkupper’s solution as this is closest to what I really want to do. -
@DankiestCitra said in How to find copy and replace "quoted text" from one text file to another:
I know you are probably annoyed a lot by me right now, but this method has confused the hell out of me.
No, not annoyed but it does seem as if you didn’t understand the concept. When I used your examples and my regex I get this:
which is what I think you need. There is no need to keep both sets of data as you are only interested in re-inserting this updated data back into the original file. The other file was only a donor and doesn’t need to be kept. or at least if kept then make sure you are working on a copy which you can edit without affecting the real files.
But if you can cobble together a process that makes you feel happy and seems to work for you, then good.
Terry
-
@Terry-R
But why do this long process just to keep the data that I need to insert? I can already do that easily by Marking, bookmarking and Copy/pasting the bookmarked lines.
Then I can manipulate the beginning and end of the text as I see fit, to add/remove spaces, characters at the end, etc.But you are correct in that I didn’t actually understand what your regex was supposed to do, I misunderstood completely, and I apologize for that.
In any case, your solution did help me, because it taught me how to separate the lines into alternating pairs, numbered appropriately, and now I successfully used @mkupper 's solution, together with your steps 1 through 8 (in the other thread), which are fairly straightforward and easy, I formatted the text I wanted into the format that mkupper showed me, and this works and is actually faster and better, with fewer mistakes possible.
Thank you very much @mkupper and @Terry-R for your input, I do realize I am a complete noob, but I did learn some things in the process.
P.S. The solution ended up being quite complicated for me, with more than 16 total steps to complete, but now that I learned it, it just seems easy.
-
@DankiestCitra said in How to find copy and replace "quoted text" from one text file to another:
But why do this long process just to keep the data that I need to insert?
I couldn’t understand that statement until I looked again at my original solution in #16287.
So I think I now see some of your confusion. You tried to follow the steps in #16287, but as I had alluded to earlier I was going to redo it for your requirement. The original solution relied on the data being dissimilar enough that the pairs would align correctly. In the end you just wanted a regex and I complied.
What I should have done was to redo that solution for you. In essence what we need to do for both files is:
- add line numbers to all lines starting from 1 onwards
- bookmark and then cut and paste in another tab the “name” lines
- redo line numbering but this time one set will have odd numbers and the 2nd set will have even numbers. this might mean they are pasted into different tabs first, given new numbers and then combined into 1 tab.
- sort lines based on their line numbering. This combines them so new data is alongside the old data.
- use a regex (as I showed earlier) to combine the 2 lines so that the “original” line number now has the new data. this also removes the 2nd line numbering added in step #3.
- Re-insert the combined data into the original file and re-sort based on line numbers.
- remove the line numbers.
The “donor” file which had the new data was being edited in the process, so that if needed in it’s original state would have meant a copy would be required for this process. The recipient file is being edited, on purpose and can be either a copy (generally the best idea especially when running a new process through the first few times), or the original if the process is tried and true. If a copy then it’s copied over top of the original after spot checking the result.
Hopefully that makes it a bit clearer.
Terry