How to get search text's key value in xml file

ganesan govindarajan

Hi Team Good day!!

I need your help!! in Notepad++ regex. I have the two input datas as listed below

I have a XML file which is list of tasks with key value (key=“Task-12345678…”)
I have the list of references like Task 12-72-45, Subtask 12-77-22, Graphic 71-22-33 etc. which is placed at the end of file

I have to find item 2 in item 1 (XML) file and get respective task’s key value. Kindly refer the sample file here. Please help me out on this !!

<task breaknbr="00" chapnbr="71" chg="U" func="100" key="TASK-712101100800" pgblknbr="601" revdate="20190915" sectnbr="21" seq="800" subjnbr="01">
  <effect effrg="TAY/ALL" efftext="TAY/ALL"></effect>
  <title>title</title>
  <subtask chapnbr="71" chg="U" func="860" key="SUBTASK-712101860003" pgblknbr="601" revdate="20190915" sectnbr="21" seq="003" subjnbr="01">
    <title>Data</title>
    <list1>
      <l1item>
        <para>para</para>
      </l1item>
      <l1item>
        <para>para</para>
        <unlist>
          <unlitem>
            <para>para</para>
            <para>para</para>
          </unlitem>
        </unlist>
      </l1item>
      <l1item>
        <para>para</para>
        <unlist>
          <unlitem>
            <para>TASK 71-21-01-100-80 para continues.</para>
          </unlitem>
        </unlist>
      </l1item>
    </list1>
  </subtask>
</task>
<task breaknbr="00" chapnbr="71" chg="U" func="100" key="TASK-712101100801" pgblknbr="601" revdate="20190915" sectnbr="21" seq="800" subjnbr="01">
  <effect effrg="TAY/ALL" efftext="TAY/ALL"></effect>
  <title>title</title>
  <subtask chapnbr="71" chg="U" func="860" key="SUBTASK-712101860003" pgblknbr="601" revdate="20190915" sectnbr="21" seq="003" subjnbr="01">
    <title>Data</title>
    <list1>
      <l1item>
        <para>para</para>
      </l1item>
      <l1item>
        <para>para</para>
        <unlist>
          <unlitem>
            <para>para</para>
            <para>para</para>
          </unlitem>
        </unlist>
      </l1item>
      <l1item>
        <para>para</para>
        <unlist>
          <unlitem>
            <para>TASK 71-21-01-200-801 para continues.</para>
          </unlitem>
        </unlist>
      </l1item>
    </list1>
  </subtask>
</task>
List of task here 		this is the expected result
TASK 71-21-01-100-80	TASK-712101100800
TASK 71-21-01-200-801	TASK-712101100801
.
.
.
.

Thanks
Ganesan G

Terry R

@ganesan-govindarajan said in How to get search text's key value in xml file:

I have to find item 2 in item 1 (XML) file and get respective task’s key value. Kindly refer the sample file here. Please help me out on this !!

On first look it does seem to be quite complicated. However if you break it down into a number of steps, each becomes a lot easier to create a regex to perform that step.

So that’s the basis of my solution. There may well be other solutions put forward by other forum members.

Make each record (from <task to </task>) be on 1 line. I actually remove the last line (</task>) of each record in the process. This makes no difference to the outcome as all non-essential data is removed in this solution. So please only do this on a copy of the file, it is a destructive process.
Using the Replace function:
FW:\R^\x20+|</task>\R?
RW: nothing in this field
Grab the 2 key pieces of data removing all the rest of the record
Using the Replace function:
FW:(?-s)^.+?key="(TASK-\d+).+?<para>(TASK\x20[^\x20]+).+
RW:${2}\t${1}
Sort lines lexicographically ascending using Edit, Line Operations, Sort lines lexicographically ascending
Now check each record and remove if the number does not appear in the list of records to find. As the sort in step #3 put the tasks to find alongside the data to be found, they will be a pair. This step uses the Mark function to mark the paired lines. Then the unmarked lines are removed.
Using the Mark function:
FW:^(TASK\x20[^\x20]+)\R\1
Tick bookmark line and click mark all. Paired lines will have a sphere/circle icon at the start of the lines. Close the Mark function window.
Search, bookmark, remove unmarked lines So what’s left are pairs of lines. First line of each pair is the value being searched for, the second line is the data found.

At this point I will leave it to you to determine the final look of the file. Since you combined the before look with the after look it was a bit confusing as to exactly what format you wanted. I think the result I have produced is probably close enough.

Terry

guy038

Hello, @ganesan-govindarajan, @terry-r and All,

How are you ? Fine, I hope so ! Although I may have completely misunderstood your goal, I found out a very simple solution :

Just follow carefully this road map :

Open your XML file, containing your list of tasks, in Notepad++
Delete any possible trailing list of references, located at the very end of your file

=> So, we start with that INPUT text :

<task breaknbr="00" chapnbr="71" chg="U" func="100" key="TASK-712101100800" pgblknbr="601" revdate="20190915" sectnbr="21" seq="800" subjnbr="01">
  <effect effrg="TAY/ALL" efftext="TAY/ALL"></effect>
  <title>title</title>
  <subtask chapnbr="71" chg="U" func="860" key="SUBTASK-712101860003" pgblknbr="601" revdate="20190915" sectnbr="21" seq="003" subjnbr="01">
    <title>Data</title>
    <list1>
      <l1item>
        <para>para</para>
      </l1item>
      <l1item>
        <para>para</para>
        <unlist>
          <unlitem>
            <para>para</para>
            <para>para</para>
          </unlitem>
        </unlist>
      </l1item>
      <l1item>
        <para>para</para>
        <unlist>
          <unlitem>
            <para>TASK 71-21-01-100-80 para continues.</para>
          </unlitem>
        </unlist>
      </l1item>
    </list1>
  </subtask>
</task>
<task breaknbr="00" chapnbr="71" chg="U" func="100" key="TASK-712101100801" pgblknbr="601" revdate="20190915" sectnbr="21" seq="800" subjnbr="01">
  <effect effrg="TAY/ALL" efftext="TAY/ALL"></effect>
  <title>title</title>
  <subtask chapnbr="71" chg="U" func="860" key="SUBTASK-712101860003" pgblknbr="601" revdate="20190915" sectnbr="21" seq="003" subjnbr="01">
    <title>Data</title>
    <list1>
      <l1item>
        <para>para</para>
      </l1item>
      <l1item>
        <para>para</para>
        <unlist>
          <unlitem>
            <para>para</para>
            <para>para</para>
          </unlitem>
        </unlist>
      </l1item>
      <l1item>
        <para>para</para>
        <unlist>
          <unlitem>
            <para>TASK 71-21-01-200-801 para continues.</para>
          </unlitem>
        </unlist>
      </l1item>
    </list1>
  </subtask>
</task>

Move to the very beginning of your XML file
Open the Mark dialog ( Ctrl + M )
Type in (?-s)(?<=key=")TASK-.+?(?=")|(?<=<para>)TASK\x20[\d-]+ in the Find what : field
Untick all box options
Check the Purge for each search and Match case options
Select the Regular expression search mode
Click on the Mark All button

=> You’ll get the message : Mark: 4 matches from caret to end-of-file

Click on the Copy Marked Text button
Go to the very end of your file ( Ctrl + End )
Enter few line-breaks, hitting the Enter key
Then, paste the clipboard contents ( Ctrl + V )

=> From the INPUT text, you should get this temporary output below, after your XML contents :

TASK-712101100800
TASK 71-21-01-100-80
TASK-712101100801
TASK 71-21-01-200-801

Now, move the caret at the beginning of this new pasted text
Open the REPLACE dialog ( Ctrl + H )
Untick all box options
Type in (?-s)^(.+)\R(.+) in the Find what : field
Type in \2\t\1 in the Replace with : field
Click on the Replace All button

=> Here you are ! You’ll get the final expected OUTPUT, below, under your XML contents :

TASK 71-21-01-100-80	TASK-712101100800
TASK 71-21-01-200-801	TASK-712101100801

Best Regards,

guy038