search for certain data in a certain tag then copy these records - xml

Momo Hara

Dear

I hope to get help with the following, as I started to learn the basics of xml .

using notepad+++ made this task easier. thanks a lot for notepad+++

I want to search for the apostrof “ ‘ “ (repeating : the ‘ symbol) but only in a certain tag.
<Name> first name last name </Name>
So even if the ‘ is to be seen in a streetadress I don’t want to see that one.

And sometimes the ‘ is been used in one name twice or even more.

I want all name tags that has at least one apos ‘.

And if possible, how to copy these records in one click.

I first tried to seach for expression that can help
>'< : cannot limit search to one tag.

tried others and have put them between the tags, but then nothing could be found.

I searched a lot before loggin this to you guys (and girls).

Hope to hear from you soon.

Thanks in advance

Terry R

@Momo-Hara said in search for certain data in a certain tag then copy these records - xml:

I want to search for the apostrof “ ‘ “ (repeating : the ‘ symbol) but only in a certain tag.

That seems to be a relatively easy one to code up in a regular expression, however without actual examples it will be difficult to do.

Please provide some example lines of where you DO want to mark/copy the line and also some where the multiple occurrences exist but the line should NOT be marked/copied. that will provide us with the information we need to help you.

To insert the examples, insert them in your reply, then highlight ALL the example lines and hit the </> button immediately above the posting window. this inserts the examples within a black box preventing the posting engine from potentially mangling/changing the data.

Terry

Momo Hara

ONly the tag street not needed to be find:

Line 1697:       <Name>B'firstname name </Name>
	Line 14405:       <Name>name - d'familyname</Name>
	Line 16835:       <Name>D'famillyname claude</Name>
	Line 16853:       <Name>d'familhname jozef</Name>
	Line 16871:       <Name>d'familyame - van dyck</Name>
	Line 16889:       <Name>D'familyname Paul</Name>
	Line 16925:       <Name>d'familyname Jose</Name>
		Line 21713:       <Name>familyname'S </Name>
		Line 29615:       <Name>familyname's first name </Name>
	Line 31919:       <Name>first name second name - d'familyname</Name>
	Line 33341:       <Name>MW S'familyname, GRETA</Name>
	Line 33359:       <Name>MW T'familyname, SOETKIN</Name>
	Line 36167:       <Name>O'familyname Peter</Name>
	Line 36203:       <Name>o'familyname francis</Name>
	Line 42863:       <Name> first name D'family name</Name>
Line 42919:       <Street>RUE DE L'streetname</Street>
	Line 42937:       <Street>RUE DE L'streetname</Street>
	Line 43009:       <Street>rue de l'streetname</Street>
	Line 52675:       <Street>Snoy et d'streetname</Street>


Others not to be found by the expresson \>'\<: 
			  ```
<Name>first name  '''name'''</Name>
<Name>'t name</Name>

Terry R

We will use the “Mark” function which then allows you to perform additional functions such as copying marked lines afterwards.

The Mark function (default hotkey is Ctrl-M) is under the “Search” menu, don’t select Mark All as that is different.
Find What:(?-s)^.+?<name>.+?'
Make sure “bookmark lines” is ticked as well as the search mode is set to “regular expression”. Click on the “Mark All” button, then close that window. You should now see the lines which have both Name and ' in them highlighted and with a blue circle in the line number column at the start.

At this point you can either copy or cut those lines by using the Copy or Cut Bookmarked Lines which is under “Search”, then “bookmark” menu.

Terry

PS, not sure I completely undertsand what you were saying with “Others not to be found…” however your first request was for only the “Name” lines to be used in the search.

Terry R

@Momo-Hara said in search for certain data in a certain tag then copy these records - xml:

Others not to be found by the expresson >‘<:
```
<Name>first name ‘’‘name’’'</Name>
<Name>'t name</Name>

I’ve had a bit of time to try and understand the above statement. I think I now understand it to mean exclude lines with:

only the ' character. This is already done as ONLY lines with the word “<name>” (case insensitive, so can be Name, nAmE etc) in them are possible options.
multiple ' together. Not yet, but my updated regex below should work to exclude these.
where the ' character is the first one after the <name> tag. Not yet, but the updated regex below will exclude that.

So the updated version is:
Find What:(?-s)^.+?<name>[^'<]+?'[^']

I hope that will help.

To give you some understanding of the regex we have:
(?-s) work on each line by itself
^.+?<name> find lines starting possibly with some characters then must followed by the <name> tag
[^'<]+? keep on finding characters so long as they are not the ' nor the <
'[^'] find a ' character, then immediately following is NOT allowed to be a ' character.

Terry

Momo Hara

@Terry-R said in search for certain data in a certain tag then copy these records - xml:

> at:(?-s)^.+?<name>.+?'

Dear Terry, thanks a lot.

using the expression, did find 19 of the 20 names.
there one one name not markedd: <Name>'t name</Name>

When i said that others not to be found by the expression >'<:

I mean, when I was searching for the apos using >'<: , the result was not complete. I suspected 20 name tags. .

so the two lines that I put at the end where not found by the expresson >'<:

now, the expression (?-s)^.+?<name>.+?’ ,is able to find 19 of the 20 names. only left is the name starting with a apos '.

Terry R

This post is deleted!

Momo Hara

@Terry-R said in search for certain data in a certain tag then copy these records - xml:

(?-s)^.+?<name>[^‘<]+?’[^’

ok, I tested the updated expression.

in my previouis respond I said I should have 20 names but it’s 21.

The first expression finds 20 of 21 , the one only missing is
<Name>'t name</Name>

the second updated expression finds 19 of 21, it’s missing
<Name>first name ‘’‘name’‘’</Name> that was found by first expression and still the missing one in both expression
.<Name>'t name</Name>

Terry R

@Momo-Hara said in search for certain data in a certain tag then copy these records - xml:

The first expression finds 20 of 21 , the one only missing is
<Name>'t name</Name>

Since the first expression was closest I used that and adjusted for the last situation not found. So the new regex is:
Find What:(?-s)^.+?<name>([^<]+?|)'
Lets hope that covers all your lines where the ' exists at least once between <name> and </name> tags.

Terry

Momo Hara

@Terry-R said in search for certain data in a certain tag then copy these records - xml:

(?-s)^.+?<name>([^<]+?|)’

Thanks a lot, this one has found al the names.

I tried the steps to copy the records, but only the tag name is been copied.

Is there a methode to copy all the recoreds that has the apos in the name ?

Goal is to paste them in a seperate file.

Terry R

@Momo-Hara said in search for certain data in a certain tag then copy these records - xml:

I tried the steps to copy the records, but only the tag name is been copied.

If you ticked the option “bookmark line” when using the find, then each line containing the ' will be selected. Using the copy or cut bookmarked lines WILL copy each line entirely.

Terry

Momo Hara

@Terry-R

Dear

the line only shows the name tag
the next line the street,
etc.

Would be great if I could copy complete record as only the name tag is selected, it only copies the name tag line

Terry R

@Momo-Hara said in search for certain data in a certain tag then copy these records - xml:

the line only shows the name tag
the next line the street,

You said you ONLY wanted the lines with the <name> tag. Now it seems that that is the start of a multi line record. Even your examples did NOT show any “multi line” records. This is very important otherwise you are just wasting my time.

You need to better explain and provide better examples of “complete” records so I can help.

Terry

Momo Hara

@Terry-R
Sorry.

an example

<Customer>
      <Prime></Prime>
      <Name>D'famillyname name  apos can be everywhere (start middle end) of more then one</Name>
      <Country>Countrycode</Country>
      <Street>streetname</Street>
      <HouseNumber>29</HouseNumber>
      <ZipCode>9790</ZipCode>
      <Language>1</Language>
      <CurrencyCode>EUR</CurrencyCode>
      <VATCode>0 or other value</VATCode>
      <VATStatus>0 or other value </VATStatus>
      <VATNumber> empty or other value</VATNumber>
      <AccountSale>number</AccountSale>
      <CountryVATNumber>Countrycode</CountryVATNumber>
      <Ventil>4 of other value</Ventil>
      <Status>0</Status>
   </Customer>

When I pointed to ‘these record’ in my first question posted, I meant to copy the complete record that has in the name tag an apos.

My apologies once again.

Terry R

@Momo-Hara said in search for certain data in a certain tag then copy these records - xml:

an example

That’s much better. There was no way I could have known that as you never showed this before. In fact everything you asked for pointed to single line records. Now I have the full picture I can design a better regex. So I now have:
Find What:(?-s)(^.+\R){2}^.+?<name>([^<]+?|)'.+\R(?s).+?</customer>

There is a proviso with this regex. The <Name> line MUST appear exactly 2 lines below the <customer> line. If this is not the case it becomes more complex. The remaining portion of the regex will mark all lines after the <name> line up to and including the closing </customer> line.

Try this and advise if it suits, or if needs further tuning.

Terry

Momo Hara

@Terry-R

Dear Terry

correct, each name tag is second.

and this expression did the trick. thank a lot.

Would be great if I understand this expression , so next time if I look for another characer , …

You already explained: I assume the ^ is the apostrof ’ , so if I want to search something else I need a list of expression symbol that matches the character I’m looking for? Where can I find basis explanation about these expressions?

To give you some understanding of the regex we have:
(?-s) work on each line by itself
^.+?<name> find lines starting possibly with some characters then must followed by the <name> tag
[^'<]+? keep on finding characters so long as they are not the ’ nor the <
‘[^’] find a ’ character, then immediately following is NOT allowed to be a ’ character.

Thanks a lot again, you really helped me out .

Terry R

I suggest you head over to https://regex101.com/r/G1m32O/1 where I’ve loaded my regex. As this testing website uses a slightly different engine I had to add 1 character. But the benefit of this site is that it explains each portion of the regex in good detail.

I don’t actually expect you to immediately understand all of it, but hopefully it has given you inspiration to go ahead and learn more about regular expressions. Read some of the posts in our FAQ section, there are a lot of links that can help you learn.

Terry

Terry R

@Momo-Hara said in search for certain data in a certain tag then copy these records - xml:

I assume the ^ is the apostrof ’

No, this character means 1 of several things depending on where it is used. At the start behind the (?-s)^ it means the start of a line. When used inside the [^<] it means the negation of the following character(s), thus anything but a < character in this case.

I do strongly suggest learning regex coding, but remember to start small. It can be daunting to attempt to get in this deep straight away. You need to learn as you would build a house. Starting with a good foundation.

Terry

Momo Hara

Ok, many thanks for all the information. once I start learning the expression, I will do it step by step… .