How to get search text's key value in xml file
-
Hi Team Good day!!
I need your help!! in Notepad++ regex. I have the two input datas as listed below
- I have a XML file which is list of tasks with key value (key=“Task-12345678…”)
- I have the list of references like Task 12-72-45, Subtask 12-77-22, Graphic 71-22-33 etc. which is placed at the end of file
I have to find item 2 in item 1 (XML) file and get respective task’s key value. Kindly refer the sample file here. Please help me out on this !!
<task breaknbr="00" chapnbr="71" chg="U" func="100" key="TASK-712101100800" pgblknbr="601" revdate="20190915" sectnbr="21" seq="800" subjnbr="01"> <effect effrg="TAY/ALL" efftext="TAY/ALL"></effect> <title>title</title> <subtask chapnbr="71" chg="U" func="860" key="SUBTASK-712101860003" pgblknbr="601" revdate="20190915" sectnbr="21" seq="003" subjnbr="01"> <title>Data</title> <list1> <l1item> <para>para</para> </l1item> <l1item> <para>para</para> <unlist> <unlitem> <para>para</para> <para>para</para> </unlitem> </unlist> </l1item> <l1item> <para>para</para> <unlist> <unlitem> <para>TASK 71-21-01-100-80 para continues.</para> </unlitem> </unlist> </l1item> </list1> </subtask> </task> <task breaknbr="00" chapnbr="71" chg="U" func="100" key="TASK-712101100801" pgblknbr="601" revdate="20190915" sectnbr="21" seq="800" subjnbr="01"> <effect effrg="TAY/ALL" efftext="TAY/ALL"></effect> <title>title</title> <subtask chapnbr="71" chg="U" func="860" key="SUBTASK-712101860003" pgblknbr="601" revdate="20190915" sectnbr="21" seq="003" subjnbr="01"> <title>Data</title> <list1> <l1item> <para>para</para> </l1item> <l1item> <para>para</para> <unlist> <unlitem> <para>para</para> <para>para</para> </unlitem> </unlist> </l1item> <l1item> <para>para</para> <unlist> <unlitem> <para>TASK 71-21-01-200-801 para continues.</para> </unlitem> </unlist> </l1item> </list1> </subtask> </task> List of task here this is the expected result TASK 71-21-01-100-80 TASK-712101100800 TASK 71-21-01-200-801 TASK-712101100801 . . . .
Thanks
Ganesan G -
@ganesan-govindarajan said in How to get search text's key value in xml file:
I have to find item 2 in item 1 (XML) file and get respective task’s key value. Kindly refer the sample file here. Please help me out on this !!
On first look it does seem to be quite complicated. However if you break it down into a number of steps, each becomes a lot easier to create a regex to perform that step.
So that’s the basis of my solution. There may well be other solutions put forward by other forum members.
-
Make each record (from <task to </task>) be on 1 line. I actually remove the last line (</task>) of each record in the process. This makes no difference to the outcome as all non-essential data is removed in this solution.
So please only do this on a copy of the file, it is a destructive process.
Using the Replace function:
FW:\R^\x20+|</task>\R?
RW: nothing in this field -
Grab the 2 key pieces of data removing all the rest of the record
Using the Replace function:
FW:(?-s)^.+?key="(TASK-\d+).+?<para>(TASK\x20[^\x20]+).+
RW:${2}\t${1}
-
Sort lines lexicographically ascending using Edit, Line Operations, Sort lines lexicographically ascending
-
Now check each record and remove if the number does not appear in the list of records to find. As the sort in step #3 put the tasks to find alongside the data to be found, they will be a pair. This step uses the Mark function to mark the paired lines. Then the unmarked lines are removed.
Using the Mark function:
FW:^(TASK\x20[^\x20]+)\R\1
Tick bookmark line and click mark all. Paired lines will have a sphere/circle icon at the start of the lines. Close the Mark function window. -
Search, bookmark, remove unmarked lines So what’s left are pairs of lines. First line of each pair is the value being searched for, the second line is the data found.
At this point I will leave it to you to determine the final look of the file. Since you combined the before look with the after look it was a bit confusing as to exactly what format you wanted. I think the result I have produced is probably close enough.
Terry
-
-
Hello, @ganesan-govindarajan, @terry-r and All,
How are you ? Fine, I hope so ! Although I may have completely misunderstood your goal, I found out a very simple solution :
Just follow carefully this road map :
-
Open your
XML
file, containing your list of tasks, in Notepad++ -
Delete any possible trailing list of references, located at the very end of your file
=> So, we start with that INPUT text :
<task breaknbr="00" chapnbr="71" chg="U" func="100" key="TASK-712101100800" pgblknbr="601" revdate="20190915" sectnbr="21" seq="800" subjnbr="01"> <effect effrg="TAY/ALL" efftext="TAY/ALL"></effect> <title>title</title> <subtask chapnbr="71" chg="U" func="860" key="SUBTASK-712101860003" pgblknbr="601" revdate="20190915" sectnbr="21" seq="003" subjnbr="01"> <title>Data</title> <list1> <l1item> <para>para</para> </l1item> <l1item> <para>para</para> <unlist> <unlitem> <para>para</para> <para>para</para> </unlitem> </unlist> </l1item> <l1item> <para>para</para> <unlist> <unlitem> <para>TASK 71-21-01-100-80 para continues.</para> </unlitem> </unlist> </l1item> </list1> </subtask> </task> <task breaknbr="00" chapnbr="71" chg="U" func="100" key="TASK-712101100801" pgblknbr="601" revdate="20190915" sectnbr="21" seq="800" subjnbr="01"> <effect effrg="TAY/ALL" efftext="TAY/ALL"></effect> <title>title</title> <subtask chapnbr="71" chg="U" func="860" key="SUBTASK-712101860003" pgblknbr="601" revdate="20190915" sectnbr="21" seq="003" subjnbr="01"> <title>Data</title> <list1> <l1item> <para>para</para> </l1item> <l1item> <para>para</para> <unlist> <unlitem> <para>para</para> <para>para</para> </unlitem> </unlist> </l1item> <l1item> <para>para</para> <unlist> <unlitem> <para>TASK 71-21-01-200-801 para continues.</para> </unlitem> </unlist> </l1item> </list1> </subtask> </task>
-
Move to the very beginning of your
XML
file -
Open the Mark dialog (
Ctrl + M
) -
Type in
(?-s)(?<=key=")TASK-.+?(?=")|(?<=<para>)TASK\x20[\d-]+
in the Find what : field -
Untick all box options
-
Check the
Purge for each search
andMatch case
options -
Select the
Regular expression
search mode -
Click on the
Mark All
button
=> You’ll get the message :
Mark: 4 matches from caret to end-of-file
-
Click on the
Copy Marked Text
button -
Go to the very end of your file (
Ctrl + End
) -
Enter few line-breaks, hitting the
Enter
key -
Then, paste the clipboard contents (
Ctrl + V
)
=> From the INPUT text, you should get this temporary output below, after your
XML
contents :TASK-712101100800 TASK 71-21-01-100-80 TASK-712101100801 TASK 71-21-01-200-801
-
Now, move the caret at the beginning of this new pasted text
-
Open the REPLACE dialog (
Ctrl + H
) -
Untick all box options
-
Type in
(?-s)^(.+)\R(.+)
in the Find what : field -
Type in
\2\t\1
in the Replace with : field -
Click on the
Replace All
button
=> Here you are ! You’ll get the final expected OUTPUT, below, under your
XML
contents :TASK 71-21-01-100-80 TASK-712101100800 TASK 71-21-01-200-801 TASK-712101100801
Best Regards,
guy038
-