Replace xml tagging
-
Hello. Please pardon mistakes here as this is my first post.
I have an XML based technical manual with hundreds of work packages (xml files).
Inside many of these files, there are “link” tagging structures that need to be replaced with “xref” construct. The references in the link’s are valid and need to be transferred into the xref structure.
For example:
<link linkaction=“immediate” linktype=“return” xlink:href=“IETM://S50005#S50005-TOOL1”
xreftype=“table”>
<prompt>Multimeter</prompt>
</link>needs to be replaced with:
<xref itemid=“S50005-TOOL1” wpid=“S50005”/>I did get what appears to be an XSLT based solution, but I’m not sure how to use it.
Here is that solution:Here is XSLT based solution. Notepad++ has XML Tools plugin for that.
To make input XML well-formed, I had to add a namespace to the root tag.
Input XML<?xml version=“1.0”?>
<root xmlns:xlink=“URI”>
<link linkaction=“immediate” linktype=“return”
xlink:href=“IETM://S50005#S50005-TOOL1” xreftype=“table”>
<prompt>Multimeter</prompt>
</link>
<link linkaction=“immediate” linktype=“return”
xlink:href=“IETM://S50005#S50018-TOOL15” xreftype=“table”>
<prompt>Multimeter</prompt>
</link>
</root>
XSLT<?xml version=“1.0”?>
<xsl:stylesheet version=“1.0” xmlns:xsl=“http://www.w3.org/1999/XSL/Transform ” xmlns:xlink=“URI”>
<xsl:output method=“xml” encoding=“utf-8” indent=“yes” omit-xml-declaration=“yes”/>
<xsl:strip-space elements=“*”/><xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <xsl:template match="link"> <xref itemid="{substring-after(@xlink:href, '#')}" wpid="{substring-before(substring-after(@xlink:href, '#'),'-')}"/> </xsl:template>
</xsl:stylesheet>
Output<root xmlns:xlink=“URI”>
<xref itemid=“S50005-TOOL1” wpid=“S50005” />
<xref itemid=“S50018-TOOL15” wpid=“S50018” />
</root>Now, that output does look good.
Is it possible to do a simple Search, Find in Files, Find/Replace?Here are my system details:
Notepad++ v8.1.5 (64-bit)
Build time : Sep 26 2021 - 15:23:23
Path : C:\Program Files\Notepad++\notepad++.exe
Command Line :
Admin mode : OFF
Local Conf mode : OFF
Cloud Config : OFF
OS Name : Windows 10 Pro for Workstations (64-bit)
OS Version : 2009
OS Build : 19042.1237
Current ANSI codepage : 1252
Plugins : ComparePlugin.dll mimeTools.dll NppConverter.dll NppExport.dll XMLTools.dll -
Hello, @richard-howard and All,
I think that you don’t even need the
XSLT based
solution. A regex S/R should be enough !-
Open the Replace dialog (
Ctrl + H
)-
SEARCH
(?-si)<link .+//(.+)#(.+)"(?s:.+?)</link>
-
REPLACE
<xref itemid="\2" wpid="\1" />
-
Tick the
Wrap around
option -
Select the
Regular expression
search mode -
Click on the
Replace All
button
-
-
Press the
Esc
key to close the Replace dialog
Voila !
Best Regards
guy038
-
-
@Richard-Howard
Hello Guy. Thanks for that prompt response!
Copied your Search and Replace strings into my Replace fields.
It doesn’t find the <link string. “0 occurrences were replaced”
I’ve loving the hope though! -
Hi, @richard-howard and All,
Ah… OK ! In my regex, I suppose that the line, beginning with
<link •••••
till the stringxlink:href=“IETM://•••••#••••••••••”
was on a single line !As it can be split in two lines, I found out the right solution. But before providing this new S/R, I need additional information !
Is your INPUT text like :
<?xml version="1.0"?> <root xmlns:xlink="URI"> <link linkaction="immediate" linktype="return" xlink:href="IETM://S50005#S50005-TOOL1" xreftype="table"> <prompt>Multimeter</prompt> </link> <link linkaction="immediate" linktype="return" xlink:href="IETM://S50005#S50018-TOOL15" xreftype="table"> <prompt>Multimeter</prompt> </link> <link linkaction="immediate" linktype="return" xlink:href="IETM://S50005#S50022-TOOL723" xreftype="table"> <prompt>Multimeter</prompt> </link> <link linkaction="immediate" linktype="return" xlink:href="IETM://S50005#S50099-TOOL0" xreftype="table"> <prompt>Multimeter</prompt> </link> </root>
So, the string, between
//
and the#
char, is alwaysS50005
and then the OUTPUT would be :<root xmlns:xlink=“URI”> <xref itemid=“S50005-TOOL1” wpid=“S50005” /> <xref itemid=“S50018-TOOL15” wpid=“S50005” /> <xref itemid=“S50022-TOOL723” wpid=“S50005” /> <xref itemid=“S50099-TOOL0” wpid=“S50005” /> </root>
OR do you mean :
<?xml version="1.0"?> <root xmlns:xlink="URI"> <link linkaction="immediate" linktype="return" xlink:href="IETM://S50005#S50005-TOOL1" xreftype="table"> <prompt>Multimeter</prompt> </link> <link linkaction="immediate" linktype="return" xlink:href="IETM://S50018#S50018-TOOL15" xreftype="table"> <prompt>Multimeter</prompt> </link> <link linkaction="immediate" linktype="return" xlink:href="IETM://S50022#S50022-TOOL723" xreftype="table"> <prompt>Multimeter</prompt> </link> <link linkaction="immediate" linktype="return" xlink:href="IETM://S50099#S50099-TOOL0" xreftype="table"> <prompt>Multimeter</prompt> </link> </root>
And, in this case, the string, between
//
and the#
char is just repeated right after the#
character and, then, the OUTPUT would be :<root xmlns:xlink=“URI”> <xref itemid=“S50005-TOOL1” wpid=“S50005” /> <xref itemid=“S50018-TOOL15” wpid=“S50018” /> <xref itemid=“S50022-TOOL723” wpid=“S50022” /> <xref itemid=“S50099-TOOL0” wpid=“S50099” /> </root>
Tell me which solution (
A
orB
) is desired, or, may be, an other one !See you later
Cheers,
guy038
-
@guy038
Hey Guy,
It is B. The wpid is included in the Tool number.
Thanks again! -
@Richard-Howard
Or rather, the wpid is included in the itemid.
Either way, it’s B.
Thank you! -
Hello, @richard-howard and All,
Finally, after some thought, I realized that the new regex S/R does not care at all, about cases
A
orB
as it just rewrites :-
The part after the
#
char, as theitem-id
attribute, whatever its value -
The part before the
#
char as thewpid
attribute, whatever its value
So, here is this new S/R :
SEARCH
(?-i)<link (?s:.+?)//(?-s)(.+)#(.+?)"(?s:.+?)</link>
REPLACE
<xref itemid="\2" wpid="\1" />
( Unchanged )
Nevertheless, note that I assume that the part
IETM://•••••#••••••••••”
is always written on a single line ! In the opposite case just tell me !Best Regards,
guy038
-
-
@guy038 said in Replace xml tagging:
(?-i)<link (?s:.+?)//(?-s)(.+)#(.+?)"(?s:.+?)</link>
Hello Guy,
I believe the part IETM://*****#********** will always be on one line.
I am still having trouble getting the Find/Replace to find the string.
“Can’t find the text “(?-i)<link (?s:.+?)//(?-s)(.+)#(.+?)”(?s:.+?)</link>”"
It’s like it’s looking for that literal string, instead of what the coding is intending it to find?
Anyway, I sure appreciate your efforts!
Maybe I"m doing something wrong?
I do have Wrap Around and Regular Expression selected.
Thanks,
Richard -
Hi, @richard-howard and All,
OK ! So, let’s try with this INPUT text, pasted in a new tab :
<?xml version="1.0"?> <root xmlns:xlink="URI"> <link linkaction="immediate" linktype="return" xlink:href="IETM://S50005#S50005-TOOL1" xreftype="table"> <prompt>Multimeter</prompt> </link> <link linkaction="immediate" linktype="return" xlink:href="IETM://S50018#S50018-TOOL15" xreftype="table"> <prompt>Multimeter</prompt> </link> <link linkaction="immediate" linktype="return" xlink:href="IETM://S50022#S50022-TOOL723" xreftype="table"> <prompt>Multimeter</prompt> </link> <link linkaction="immediate" linktype="return" xlink:href="IETM://S50099#S50099-TOOL0" xreftype="table"> <prompt>Multimeter</prompt> </link> </root>
After using the following S/R :
SEARCH
(?-i)<link (?s:.+?)//(?-s)(.+)#(.+?)"(?s:.+?)</link>
REPLACE
<xref itemid="\2" wpid="\1" />
You should be left with this OUTPUT text :
<?xml version="1.0"?> <root xmlns:xlink="URI"> <xref itemid="S50005-TOOL1" wpid="S50005" /> <xref itemid="S50018-TOOL15" wpid="S50018" /> <xref itemid="S50022-TOOL723" wpid="S50022" /> <xref itemid="S50099-TOOL0" wpid="S50099" /> </root>
Do you obtain this result ? If yes but you still have some problems with your real text, this means that the general template of your file is slightly different from my example ;-))
Cheers
guy038
-
Ok - I have had some success, but not always.
Apparently it is dependent on the line structuring, which you have suggested earlier.
When I adjusted the text to line up with your example, line-by-line, it works!
The alignment is not always in this form however.
Here is a block of code from one file. As you can see, the text is not lined up consistently.
In this case, the search criteria Does find three of these four. It does not find the 3rd one, with the starting link tag at the end of the line.
Any way to allow for this random type structuring?
Progress!
Thanks!!<testeqp>
<testeqp-setup-item><name>A/B Interface Cable Qty: 4</name><itemref>
<link linkaction=“immediate” linktype=“return”
xlink:href=“IETM://S60005#S60005-TOOL74” xreftype=“table”><promptA/B Interface Cable</prompt></link></itemref></testeqp-setup-item>
<testeqp-setup-item><name>Adapter TDCU J1 (Bradley A3 # 2)</name>
<itemref><link linkaction=“immediate” linktype=“return”
xlink:href=“IETM://S60002#S6A002-coeiitem40” xreftype=“table”><prompt
Adapter TDCU J1 (Bradley A3 # 2) </prompt></link></itemref>
</testeqp-setup-item>
<testeqp-setup-item><name>BRM Personality Adapter </name><itemref><link
linkaction=“immediate” linktype=“return”
xlink:href=“IETM://S60005#S60005-TOOL2” xreftype=“table”><prompt>BRM
Personality Adapter </prompt></link></itemref></testeqp-setup-item>
<testeqp-setup-item><name>Interconnect Device ICD No. 1</name><itemref>
<link linkaction=“immediate” linktype=“return”
xlink:href=“IETM://S60005#S60005-TOOL10” xreftype=“table”><prompt
Interconnect Device ICD No. 1 </prompt></link></itemref>
</testeqp-setup-item> -
@Richard-Howard said in Replace xml tagging:
As you can see, the text is not lined up consistently.
FYI, we cannot see that. @guy038 has been doing an excellent job of guessing your meaning, but you have not been using the instructions for Formatting Forum Posts, which are linked in the Please Read Before Posting post in the top of this Help wanted section of the forum – and this makes some aspects of the data, like leading spaces or other certain characters, not visible by us. On the other hand, @guy038’s posts have been using this formatting advice, which makes his replies to you really easy to understand, so that you know exactly what data he is using, and can see the regexes stand out in
red text
, and any indenting or special characters come through correctly. If you were to follow this formatting advice, it would be much easier for us all (including @guy038) to know what your data really is.----
Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as literal text using the
</>
toolbar button or manual formatting commands.To make
regex in red
(and so they keep their special characters like *), use backticks, like`^.*?blah.*?\z`
. Screenshots can be pasted from the clipboard to your post usingCtrl+V
to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data.Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get.
Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.
-
@PeterJones
Hello Peter. Thank you for that helpful advise. I’ll take a look at the Formatting Forum Posts and try to do a better job.
Sounds like Guy is really going over and above in trying to help me. I appreciate all of the help. -
@Richard-Howard
Ok - using the Code insert, here is the code sample I am wanting to show.<testeqp> <testeqp-setup-item><name>A/B Interface Cable Qty: 4</name><itemref> <link linkaction="immediate" linktype="return" xlink:href="IETM://S60005#S60005-TOOL74" xreftype="table"><prompt >A/B Interface Cable</prompt></link></itemref></testeqp-setup-item> <testeqp-setup-item><name>Adapter TDCU J1 (Bradley A3 # 2)</name> <itemref><link linkaction="immediate" linktype="return" xlink:href="IETM://S60002#S6A002-coeiitem40" xreftype="table"><prompt >Adapter TDCU J1 (Bradley A3 # 2) </prompt></link></itemref> </testeqp-setup-item> <testeqp-setup-item><name>BRM Personality Adapter </name><itemref><link linkaction="immediate" linktype="return" xlink:href="IETM://S60005#S60005-TOOL2" xreftype="table"><prompt>BRM Personality Adapter </prompt></link></itemref></testeqp-setup-item> <testeqp-setup-item><name>Interconnect Device ICD No. 1</name><itemref> <link linkaction="immediate" linktype="return" xlink:href="IETM://S60005#S60005-TOOL10" xreftype="table"><prompt >Interconnect Device ICD No. 1 </prompt></link></itemref> </testeqp-setup-item>
-
Hello, @richard-howard, @peterjones and All,
First, thanks for providing your INPUT text in a raw form ! However, at the same time, you should have provided your expected OUTPUT text which is :
<testeqp> <testeqp-setup-item><name>A/B Interface Cable Qty: 4</name><itemref> <xref itemid="S60005-TOOL74" wpid="S60005" /></itemref></testeqp-setup-item> <testeqp-setup-item><name>Adapter TDCU J1 (Bradley A3 # 2)</name> <itemref><xref itemid="S6A002-coeiitem40" wpid="S60002" /></itemref> </testeqp-setup-item> <testeqp-setup-item><name>BRM Personality Adapter </name><itemref><xref itemid="S60005-TOOL2" wpid="S60005" /></itemref></testeqp-setup-item> <testeqp-setup-item><name>Interconnect Device ICD No. 1</name><itemref> <xref itemid="S60005-TOOL10" wpid="S60005" /></itemref> </testeqp-setup-item>
Isn’t it ?
Of course, I slightly modified the regex S/R, replacing a single literal space char by the expression
\s+
which matches any non-null range of consecutive blank characters ( either\x20
,\t
,\xA0
,\r
,\n
,\x0B
\f
and few others )So my last version is :
SEARCH
(?-i)<link\s+(?s:.+?)//(?-s)(.+)#(.+?)"(?s:.+?)</link>
REPLACE
<xref itemid="\2" wpid="\1" />
If you want, in addition, to isolate the part replaced (
<xref itemid="•••••••••" wpid="•••••" />
), in a single line, change the REPLACE regex with :REPLACE
\r\n<xref itemid="\2" wpid="\1" />\r\n
( or\n<xref itemid="\2" wpid="\1" />\n
if you work on Unix files )If you expect any other OUTPUT displaying, just tell me !
BR
guy038
-
Hello @guy038
Despite my stumbling around here and providing you with less than perfect info to work with, I believe you have my issue resolved! This latest edition does seem to be working universally! You have been such a tremendous help. It is so appreciated. And Peter, I do thank you for pointing me in a better direction. Is there a proper way for me to report such outstanding help?
Thanks again.
Richard -
@Richard-Howard
Guy, I’d like to credit your contribution. Is there a mechanism for that?
You helped me save hours in conversion time! -
@Richard-Howard said in Replace xml tagging:
@Richard-Howard
Guy, I’d like to credit your contribution. Is there a mechanism for that?
You helped me save hours in conversion time!Clicking the little ^ by each post will “upvote” the post, giving the author of that post an extra “reputation point”.
Other than that, there’s no other “credit” mechanism or “report outstanding help” in the forum.
But being polite and saying “thank you” (as you have done) are appreciated – probably more than upvotes. :-) (At least, that’s true for me, and I assume for @guy038 as well.)
-
@PeterJones
Thanks Peter. I’m starting to get the hang of it ( I hope:)
). -
I’m starting to get the hang of it ( I hope :) ).
It worked. Good job. :-)
And a note: one of the things I like about this site, compared to certain stack-based exchange forums, is that this site doesn’t limit you to “one right answer”; many times, there are multiple posts in a discussion that help lead to a final answer, and I like being able to reward them all.