How to get search text's key value in xml file
-
Hi Team Good day!!
I need your help!! in Notepad++ regex. I have the two input datas as listed below
- I have a XML file which is list of tasks with key value (key=“Task-12345678…”)
- I have the list of references like Task 12-72-45, Subtask 12-77-22, Graphic 71-22-33 etc. which is placed at the end of file
I have to find item 2 in item 1 (XML) file and get respective task’s key value. Kindly refer the sample file here. Please help me out on this !!
<task breaknbr="00" chapnbr="71" chg="U" func="100" key="TASK-712101100800" pgblknbr="601" revdate="20190915" sectnbr="21" seq="800" subjnbr="01"> <effect effrg="TAY/ALL" efftext="TAY/ALL"></effect> <title>title</title> <subtask chapnbr="71" chg="U" func="860" key="SUBTASK-712101860003" pgblknbr="601" revdate="20190915" sectnbr="21" seq="003" subjnbr="01"> <title>Data</title> <list1> <l1item> <para>para</para> </l1item> <l1item> <para>para</para> <unlist> <unlitem> <para>para</para> <para>para</para> </unlitem> </unlist> </l1item> <l1item> <para>para</para> <unlist> <unlitem> <para>TASK 71-21-01-100-80 para continues.</para> </unlitem> </unlist> </l1item> </list1> </subtask> </task> <task breaknbr="00" chapnbr="71" chg="U" func="100" key="TASK-712101100801" pgblknbr="601" revdate="20190915" sectnbr="21" seq="800" subjnbr="01"> <effect effrg="TAY/ALL" efftext="TAY/ALL"></effect> <title>title</title> <subtask chapnbr="71" chg="U" func="860" key="SUBTASK-712101860003" pgblknbr="601" revdate="20190915" sectnbr="21" seq="003" subjnbr="01"> <title>Data</title> <list1> <l1item> <para>para</para> </l1item> <l1item> <para>para</para> <unlist> <unlitem> <para>para</para> <para>para</para> </unlitem> </unlist> </l1item> <l1item> <para>para</para> <unlist> <unlitem> <para>TASK 71-21-01-200-801 para continues.</para> </unlitem> </unlist> </l1item> </list1> </subtask> </task> List of task here this is the expected result TASK 71-21-01-100-80 TASK-712101100800 TASK 71-21-01-200-801 TASK-712101100801 . . . .Thanks
Ganesan G -
@ganesan-govindarajan said in How to get search text's key value in xml file:
I have to find item 2 in item 1 (XML) file and get respective task’s key value. Kindly refer the sample file here. Please help me out on this !!
On first look it does seem to be quite complicated. However if you break it down into a number of steps, each becomes a lot easier to create a regex to perform that step.
So that’s the basis of my solution. There may well be other solutions put forward by other forum members.
-
Make each record (from <task to </task>) be on 1 line. I actually remove the last line (</task>) of each record in the process. This makes no difference to the outcome as all non-essential data is removed in this solution.
So please only do this on a copy of the file, it is a destructive process.
Using the Replace function:
FW:\R^\x20+|</task>\R?
RW: nothing in this field -
Grab the 2 key pieces of data removing all the rest of the record
Using the Replace function:
FW:(?-s)^.+?key="(TASK-\d+).+?<para>(TASK\x20[^\x20]+).+
RW:${2}\t${1} -
Sort lines lexicographically ascending using Edit, Line Operations, Sort lines lexicographically ascending
-
Now check each record and remove if the number does not appear in the list of records to find. As the sort in step #3 put the tasks to find alongside the data to be found, they will be a pair. This step uses the Mark function to mark the paired lines. Then the unmarked lines are removed.
Using the Mark function:
FW:^(TASK\x20[^\x20]+)\R\1
Tick bookmark line and click mark all. Paired lines will have a sphere/circle icon at the start of the lines. Close the Mark function window. -
Search, bookmark, remove unmarked lines So what’s left are pairs of lines. First line of each pair is the value being searched for, the second line is the data found.
At this point I will leave it to you to determine the final look of the file. Since you combined the before look with the after look it was a bit confusing as to exactly what format you wanted. I think the result I have produced is probably close enough.
Terry
-
-
Hello, @ganesan-govindarajan, @terry-r and All,
How are you ? Fine, I hope so ! Although I may have completely misunderstood your goal, I found out a very simple solution :
Just follow carefully this road map :
-
Open your
XMLfile, containing your list of tasks, in Notepad++ -
Delete any possible trailing list of references, located at the very end of your file
=> So, we start with that INPUT text :
<task breaknbr="00" chapnbr="71" chg="U" func="100" key="TASK-712101100800" pgblknbr="601" revdate="20190915" sectnbr="21" seq="800" subjnbr="01"> <effect effrg="TAY/ALL" efftext="TAY/ALL"></effect> <title>title</title> <subtask chapnbr="71" chg="U" func="860" key="SUBTASK-712101860003" pgblknbr="601" revdate="20190915" sectnbr="21" seq="003" subjnbr="01"> <title>Data</title> <list1> <l1item> <para>para</para> </l1item> <l1item> <para>para</para> <unlist> <unlitem> <para>para</para> <para>para</para> </unlitem> </unlist> </l1item> <l1item> <para>para</para> <unlist> <unlitem> <para>TASK 71-21-01-100-80 para continues.</para> </unlitem> </unlist> </l1item> </list1> </subtask> </task> <task breaknbr="00" chapnbr="71" chg="U" func="100" key="TASK-712101100801" pgblknbr="601" revdate="20190915" sectnbr="21" seq="800" subjnbr="01"> <effect effrg="TAY/ALL" efftext="TAY/ALL"></effect> <title>title</title> <subtask chapnbr="71" chg="U" func="860" key="SUBTASK-712101860003" pgblknbr="601" revdate="20190915" sectnbr="21" seq="003" subjnbr="01"> <title>Data</title> <list1> <l1item> <para>para</para> </l1item> <l1item> <para>para</para> <unlist> <unlitem> <para>para</para> <para>para</para> </unlitem> </unlist> </l1item> <l1item> <para>para</para> <unlist> <unlitem> <para>TASK 71-21-01-200-801 para continues.</para> </unlitem> </unlist> </l1item> </list1> </subtask> </task>-
Move to the very beginning of your
XMLfile -
Open the Mark dialog (
Ctrl + M) -
Type in
(?-s)(?<=key=")TASK-.+?(?=")|(?<=<para>)TASK\x20[\d-]+in the Find what : field -
Untick all box options
-
Check the
Purge for each searchandMatch caseoptions -
Select the
Regular expressionsearch mode -
Click on the
Mark Allbutton
=> You’ll get the message :
Mark: 4 matches from caret to end-of-file-
Click on the
Copy Marked Textbutton -
Go to the very end of your file (
Ctrl + End) -
Enter few line-breaks, hitting the
Enterkey -
Then, paste the clipboard contents (
Ctrl + V)
=> From the INPUT text, you should get this temporary output below, after your
XMLcontents :TASK-712101100800 TASK 71-21-01-100-80 TASK-712101100801 TASK 71-21-01-200-801-
Now, move the caret at the beginning of this new pasted text
-
Open the REPLACE dialog (
Ctrl + H) -
Untick all box options
-
Type in
(?-s)^(.+)\R(.+)in the Find what : field -
Type in
\2\t\1in the Replace with : field -
Click on the
Replace Allbutton
=> Here you are ! You’ll get the final expected OUTPUT, below, under your
XMLcontents :TASK 71-21-01-100-80 TASK-712101100800 TASK 71-21-01-200-801 TASK-712101100801Best Regards,
guy038
-