• Login
Community
  • Login

Replace xml tagging

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
19 Posts 3 Posters 1.5k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • R
    Richard Howard
    last edited by Oct 13, 2021, 4:31 PM

    Hello. Please pardon mistakes here as this is my first post.
    I have an XML based technical manual with hundreds of work packages (xml files).
    Inside many of these files, there are “link” tagging structures that need to be replaced with “xref” construct. The references in the link’s are valid and need to be transferred into the xref structure.
    For example:
    <link linkaction=“immediate” linktype=“return” xlink:href=“IETM://S50005#S50005-TOOL1”
    xreftype=“table”>
    <prompt>Multimeter</prompt>
    </link>

    needs to be replaced with:
    <xref itemid=“S50005-TOOL1” wpid=“S50005”/>

    I did get what appears to be an XSLT based solution, but I’m not sure how to use it.
    Here is that solution:

    Here is XSLT based solution. Notepad++ has XML Tools plugin for that.
    To make input XML well-formed, I had to add a namespace to the root tag.
    Input XML

    <?xml version=“1.0”?>
    <root xmlns:xlink=“URI”>
    <link linkaction=“immediate” linktype=“return”
    xlink:href=“IETM://S50005#S50005-TOOL1” xreftype=“table”>
    <prompt>Multimeter</prompt>
    </link>
    <link linkaction=“immediate” linktype=“return”
    xlink:href=“IETM://S50005#S50018-TOOL15” xreftype=“table”>
    <prompt>Multimeter</prompt>
    </link>
    </root>
    XSLT

    <?xml version=“1.0”?>
    <xsl:stylesheet version=“1.0” xmlns:xsl=“http://www.w3.org/1999/XSL/Transform ” xmlns:xlink=“URI”>
    <xsl:output method=“xml” encoding=“utf-8” indent=“yes” omit-xml-declaration=“yes”/>
    <xsl:strip-space elements=“*”/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    
    <xsl:template match="link">
        <xref itemid="{substring-after(@xlink:href, '#')}" wpid="{substring-before(substring-after(@xlink:href, '#'),'-')}"/>
    </xsl:template>
    

    </xsl:stylesheet>
    Output

    <root xmlns:xlink=“URI”>
    <xref itemid=“S50005-TOOL1” wpid=“S50005” />
    <xref itemid=“S50018-TOOL15” wpid=“S50018” />
    </root>

    Now, that output does look good.
    Is it possible to do a simple Search, Find in Files, Find/Replace?

    Here are my system details:
    Notepad++ v8.1.5 (64-bit)
    Build time : Sep 26 2021 - 15:23:23
    Path : C:\Program Files\Notepad++\notepad++.exe
    Command Line :
    Admin mode : OFF
    Local Conf mode : OFF
    Cloud Config : OFF
    OS Name : Windows 10 Pro for Workstations (64-bit)
    OS Version : 2009
    OS Build : 19042.1237
    Current ANSI codepage : 1252
    Plugins : ComparePlugin.dll mimeTools.dll NppConverter.dll NppExport.dll XMLTools.dll

    R 1 Reply Last reply Oct 13, 2021, 6:01 PM Reply Quote 0
    • G
      guy038
      last edited by guy038 Oct 13, 2021, 5:58 PM Oct 13, 2021, 5:56 PM

      Hello, @richard-howard and All,

      I think that you don’t even need the XSLT based solution. A regex S/R should be enough !

      • Open the Replace dialog ( Ctrl + H )

        • SEARCH (?-si)<link .+//(.+)#(.+)"(?s:.+?)</link>

        • REPLACE <xref itemid="\2" wpid="\1" />

        • Tick the Wrap around option

        • Select the Regular expression search mode

        • Click on the Replace All button

      • Press the Esc key to close the Replace dialog

      Voila !

      Best Regards

      guy038

      1 Reply Last reply Reply Quote 1
      • R
        Richard Howard @Richard Howard
        last edited by Oct 13, 2021, 6:01 PM

        @Richard-Howard
        Hello Guy. Thanks for that prompt response!
        Copied your Search and Replace strings into my Replace fields.
        It doesn’t find the <link string. “0 occurrences were replaced”
        I’ve loving the hope though!

        1 Reply Last reply Reply Quote 0
        • G
          guy038
          last edited by Oct 13, 2021, 8:48 PM

          Hi, @richard-howard and All,

          Ah… OK ! In my regex, I suppose that the line, beginning with <link ••••• till the string xlink:href=“IETM://•••••#••••••••••” was on a single line !

          As it can be split in two lines, I found out the right solution. But before providing this new S/R, I need additional information !


          Is your INPUT text like :

          <?xml version="1.0"?>
          <root xmlns:xlink="URI">
          <link linkaction="immediate" linktype="return"
          xlink:href="IETM://S50005#S50005-TOOL1" xreftype="table">
          <prompt>Multimeter</prompt>
          </link>
          <link linkaction="immediate" linktype="return"
          xlink:href="IETM://S50005#S50018-TOOL15" xreftype="table">
          <prompt>Multimeter</prompt>
          </link>
          <link linkaction="immediate" linktype="return"
          xlink:href="IETM://S50005#S50022-TOOL723" xreftype="table">
          <prompt>Multimeter</prompt>
          </link>
          <link linkaction="immediate" linktype="return"
          xlink:href="IETM://S50005#S50099-TOOL0" xreftype="table">
          <prompt>Multimeter</prompt>
          </link>
          </root>
          

          So, the string, between // and the # char, is always S50005 and then the OUTPUT would be :

          <root xmlns:xlink=“URI”>
          <xref itemid=“S50005-TOOL1” wpid=“S50005” />
          <xref itemid=“S50018-TOOL15” wpid=“S50005” />
          <xref itemid=“S50022-TOOL723” wpid=“S50005” />
          <xref itemid=“S50099-TOOL0” wpid=“S50005” />
          </root>
          

          OR do you mean :

          <?xml version="1.0"?>
          <root xmlns:xlink="URI">
          <link linkaction="immediate" linktype="return"
          xlink:href="IETM://S50005#S50005-TOOL1" xreftype="table">
          <prompt>Multimeter</prompt>
          </link>
          <link linkaction="immediate" linktype="return"
          xlink:href="IETM://S50018#S50018-TOOL15" xreftype="table">
          <prompt>Multimeter</prompt>
          </link>
          <link linkaction="immediate" linktype="return"
          xlink:href="IETM://S50022#S50022-TOOL723" xreftype="table">
          <prompt>Multimeter</prompt>
          </link>
          <link linkaction="immediate" linktype="return"
          xlink:href="IETM://S50099#S50099-TOOL0" xreftype="table">
          <prompt>Multimeter</prompt>
          </link>
          </root>
          

          And, in this case, the string, between // and the # char is just repeated right after the # character and, then, the OUTPUT would be :

          <root xmlns:xlink=“URI”>
          <xref itemid=“S50005-TOOL1” wpid=“S50005” />
          <xref itemid=“S50018-TOOL15” wpid=“S50018” />
          <xref itemid=“S50022-TOOL723” wpid=“S50022” />
          <xref itemid=“S50099-TOOL0” wpid=“S50099” />
          </root>
          

          Tell me which solution ( A or B ) is desired, or, may be, an other one !

          See you later

          Cheers,

          guy038

          R 1 Reply Last reply Oct 13, 2021, 9:53 PM Reply Quote 1
          • R
            Richard Howard @guy038
            last edited by Oct 13, 2021, 9:53 PM

            @guy038
            Hey Guy,
            It is B. The wpid is included in the Tool number.
            Thanks again!

            R 1 Reply Last reply Oct 13, 2021, 9:54 PM Reply Quote 1
            • R
              Richard Howard @Richard Howard
              last edited by Oct 13, 2021, 9:54 PM

              @Richard-Howard
              Or rather, the wpid is included in the itemid.
              Either way, it’s B.
              Thank you!

              1 Reply Last reply Reply Quote 1
              • G
                guy038
                last edited by guy038 Oct 13, 2021, 11:11 PM Oct 13, 2021, 11:10 PM

                Hello, @richard-howard and All,

                Finally, after some thought, I realized that the new regex S/R does not care at all, about cases A or B as it just rewrites :

                • The part after the # char, as the item-id attribute, whatever its value

                • The part before the # char as the wpid attribute, whatever its value


                So, here is this new S/R :

                SEARCH (?-i)<link (?s:.+?)//(?-s)(.+)#(.+?)"(?s:.+?)</link>

                REPLACE <xref itemid="\2" wpid="\1" />    ( Unchanged )


                Nevertheless, note that I assume that the part IETM://•••••#••••••••••” is always written on a single line ! In the opposite case just tell me !

                Best Regards,

                guy038

                R 1 Reply Last reply Oct 14, 2021, 1:20 PM Reply Quote 0
                • R
                  Richard Howard @guy038
                  last edited by Oct 14, 2021, 1:20 PM

                  @guy038 said in Replace xml tagging:

                  (?-i)<link (?s:.+?)//(?-s)(.+)#(.+?)"(?s:.+?)</link>

                  Hello Guy,
                  I believe the part IETM://*****#********** will always be on one line.
                  I am still having trouble getting the Find/Replace to find the string.
                  “Can’t find the text “(?-i)<link (?s:.+?)//(?-s)(.+)#(.+?)”(?s:.+?)</link>”"
                  It’s like it’s looking for that literal string, instead of what the coding is intending it to find?
                  Anyway, I sure appreciate your efforts!
                  Maybe I"m doing something wrong?
                  I do have Wrap Around and Regular Expression selected.
                  Thanks,
                  Richard

                  1 Reply Last reply Reply Quote 0
                  • G
                    guy038
                    last edited by Oct 14, 2021, 2:26 PM

                    Hi, @richard-howard and All,

                    OK ! So, let’s try with this INPUT text, pasted in a new tab :

                    <?xml version="1.0"?>
                    <root xmlns:xlink="URI">
                    <link linkaction="immediate" linktype="return"
                    xlink:href="IETM://S50005#S50005-TOOL1" xreftype="table">
                    <prompt>Multimeter</prompt>
                    </link>
                    <link linkaction="immediate" linktype="return"
                    xlink:href="IETM://S50018#S50018-TOOL15" xreftype="table">
                    <prompt>Multimeter</prompt>
                    </link>
                    <link linkaction="immediate" linktype="return"
                    xlink:href="IETM://S50022#S50022-TOOL723" xreftype="table">
                    <prompt>Multimeter</prompt>
                    </link>
                    <link linkaction="immediate" linktype="return"
                    xlink:href="IETM://S50099#S50099-TOOL0" xreftype="table">
                    <prompt>Multimeter</prompt>
                    </link>
                    </root>
                    

                    After using the following S/R :

                    SEARCH (?-i)<link (?s:.+?)//(?-s)(.+)#(.+?)"(?s:.+?)</link>

                    REPLACE <xref itemid="\2" wpid="\1" />

                    You should be left with this OUTPUT text :

                    <?xml version="1.0"?>
                    <root xmlns:xlink="URI">
                    <xref itemid="S50005-TOOL1" wpid="S50005" />
                    <xref itemid="S50018-TOOL15" wpid="S50018" />
                    <xref itemid="S50022-TOOL723" wpid="S50022" />
                    <xref itemid="S50099-TOOL0" wpid="S50099" />
                    </root>
                    

                    Do you obtain this result ? If yes but you still have some problems with your real text, this means that the general template of your file is slightly different from my example ;-))

                    Cheers

                    guy038

                    R 1 Reply Last reply Oct 14, 2021, 3:24 PM Reply Quote 1
                    • R
                      Richard Howard @guy038
                      last edited by Oct 14, 2021, 3:24 PM

                      Ok - I have had some success, but not always.
                      Apparently it is dependent on the line structuring, which you have suggested earlier.
                      When I adjusted the text to line up with your example, line-by-line, it works!
                      The alignment is not always in this form however.
                      Here is a block of code from one file. As you can see, the text is not lined up consistently.
                      In this case, the search criteria Does find three of these four. It does not find the 3rd one, with the starting link tag at the end of the line.
                      Any way to allow for this random type structuring?
                      Progress!
                      Thanks!!

                      <testeqp>
                      <testeqp-setup-item><name>A/B Interface Cable Qty: 4</name><itemref>
                      <link linkaction=“immediate” linktype=“return”
                      xlink:href=“IETM://S60005#S60005-TOOL74” xreftype=“table”><prompt

                      A/B Interface Cable</prompt></link></itemref></testeqp-setup-item>
                      <testeqp-setup-item><name>Adapter TDCU J1 (Bradley A3 # 2)</name>
                      <itemref><link linkaction=“immediate” linktype=“return”
                      xlink:href=“IETM://S60002#S6A002-coeiitem40” xreftype=“table”><prompt
                      Adapter TDCU J1 (Bradley A3 # 2) </prompt></link></itemref>
                      </testeqp-setup-item>
                      <testeqp-setup-item><name>BRM Personality Adapter </name><itemref><link
                      linkaction=“immediate” linktype=“return”
                      xlink:href=“IETM://S60005#S60005-TOOL2” xreftype=“table”><prompt>BRM
                      Personality Adapter </prompt></link></itemref></testeqp-setup-item>
                      <testeqp-setup-item><name>Interconnect Device ICD No. 1</name><itemref>
                      <link linkaction=“immediate” linktype=“return”
                      xlink:href=“IETM://S60005#S60005-TOOL10” xreftype=“table”><prompt
                      Interconnect Device ICD No. 1 </prompt></link></itemref>
                      </testeqp-setup-item>

                      P 1 Reply Last reply Oct 14, 2021, 3:40 PM Reply Quote 0
                      • P
                        PeterJones @Richard Howard
                        last edited by Oct 14, 2021, 3:40 PM

                        @Richard-Howard said in Replace xml tagging:

                        As you can see, the text is not lined up consistently.

                        FYI, we cannot see that. @guy038 has been doing an excellent job of guessing your meaning, but you have not been using the instructions for Formatting Forum Posts, which are linked in the Please Read Before Posting post in the top of this Help wanted section of the forum – and this makes some aspects of the data, like leading spaces or other certain characters, not visible by us. On the other hand, @guy038’s posts have been using this formatting advice, which makes his replies to you really easy to understand, so that you know exactly what data he is using, and can see the regexes stand out in red text, and any indenting or special characters come through correctly. If you were to follow this formatting advice, it would be much easier for us all (including @guy038) to know what your data really is.

                        ----

                        Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as literal text using the </> toolbar button or manual formatting commands.

                        To make regex in red (and so they keep their special characters like *), use backticks, like `^.*?blah.*?\z`. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data.

                        Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get.

                        Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.

                        R 1 Reply Last reply Oct 14, 2021, 3:45 PM Reply Quote 0
                        • R
                          Richard Howard @PeterJones
                          last edited by Oct 14, 2021, 3:45 PM

                          @PeterJones
                          Hello Peter. Thank you for that helpful advise. I’ll take a look at the Formatting Forum Posts and try to do a better job.
                          Sounds like Guy is really going over and above in trying to help me. I appreciate all of the help.

                          R 1 Reply Last reply Oct 14, 2021, 3:52 PM Reply Quote 0
                          • R
                            Richard Howard @Richard Howard
                            last edited by Oct 14, 2021, 3:52 PM

                            @Richard-Howard
                            Ok - using the Code insert, here is the code sample I am wanting to show.

                            <testeqp>
                            <testeqp-setup-item><name>A/B Interface Cable Qty: 4</name><itemref>
                            <link linkaction="immediate" linktype="return"
                            xlink:href="IETM://S60005#S60005-TOOL74" xreftype="table"><prompt
                            >A/B Interface Cable</prompt></link></itemref></testeqp-setup-item>
                            <testeqp-setup-item><name>Adapter TDCU J1 (Bradley A3 # 2)</name>
                            <itemref><link linkaction="immediate" linktype="return"
                            xlink:href="IETM://S60002#S6A002-coeiitem40" xreftype="table"><prompt
                            >Adapter TDCU J1 (Bradley A3 # 2) </prompt></link></itemref>
                            </testeqp-setup-item>
                            <testeqp-setup-item><name>BRM Personality Adapter </name><itemref><link
                            linkaction="immediate" linktype="return"
                            xlink:href="IETM://S60005#S60005-TOOL2" xreftype="table"><prompt>BRM
                            Personality Adapter </prompt></link></itemref></testeqp-setup-item>
                            <testeqp-setup-item><name>Interconnect Device ICD No. 1</name><itemref>
                            <link linkaction="immediate" linktype="return"
                            xlink:href="IETM://S60005#S60005-TOOL10" xreftype="table"><prompt
                            >Interconnect Device ICD No. 1 </prompt></link></itemref>
                            </testeqp-setup-item>
                            
                            1 Reply Last reply Reply Quote 1
                            • G
                              guy038
                              last edited by guy038 Oct 14, 2021, 7:12 PM Oct 14, 2021, 7:00 PM

                              Hello, @richard-howard, @peterjones and All,

                              First, thanks for providing your INPUT text in a raw form ! However, at the same time, you should have provided your expected OUTPUT text which is :

                              <testeqp>
                              <testeqp-setup-item><name>A/B Interface Cable Qty: 4</name><itemref>
                              <xref itemid="S60005-TOOL74" wpid="S60005" /></itemref></testeqp-setup-item>
                              <testeqp-setup-item><name>Adapter TDCU J1 (Bradley A3 # 2)</name>
                              <itemref><xref itemid="S6A002-coeiitem40" wpid="S60002" /></itemref>
                              </testeqp-setup-item>
                              <testeqp-setup-item><name>BRM Personality Adapter </name><itemref><xref itemid="S60005-TOOL2" wpid="S60005" /></itemref></testeqp-setup-item>
                              <testeqp-setup-item><name>Interconnect Device ICD No. 1</name><itemref>
                              <xref itemid="S60005-TOOL10" wpid="S60005" /></itemref>
                              </testeqp-setup-item>
                              

                              Isn’t it ?


                              Of course, I slightly modified the regex S/R, replacing a single literal space char by the expression \s+ which matches any non-null range of consecutive blank characters ( either \x20, \t, \xA0, \r, \n, \x0B \f and few others )

                              So my last version is :

                              SEARCH (?-i)<link\s+(?s:.+?)//(?-s)(.+)#(.+?)"(?s:.+?)</link>

                              REPLACE <xref itemid="\2" wpid="\1" />


                              If you want, in addition, to isolate the part replaced ( <xref itemid="•••••••••" wpid="•••••" /> ), in a single line, change the REPLACE regex with :

                              REPLACE \r\n<xref itemid="\2" wpid="\1" />\r\n    ( or \n<xref itemid="\2" wpid="\1" />\n if you work on Unix files )

                              If you expect any other OUTPUT displaying, just tell me !

                              BR

                              guy038

                              R 1 Reply Last reply Oct 14, 2021, 8:01 PM Reply Quote 2
                              • R
                                Richard Howard @guy038
                                last edited by Oct 14, 2021, 8:01 PM

                                Hello @guy038
                                Despite my stumbling around here and providing you with less than perfect info to work with, I believe you have my issue resolved! This latest edition does seem to be working universally! You have been such a tremendous help. It is so appreciated. And Peter, I do thank you for pointing me in a better direction. Is there a proper way for me to report such outstanding help?
                                Thanks again.
                                Richard

                                R 1 Reply Last reply Oct 15, 2021, 2:10 PM Reply Quote 2
                                • R
                                  Richard Howard @Richard Howard
                                  last edited by Oct 15, 2021, 2:10 PM

                                  @Richard-Howard
                                  Guy, I’d like to credit your contribution. Is there a mechanism for that?
                                  You helped me save hours in conversion time!

                                  P 2 Replies Last reply Oct 15, 2021, 2:32 PM Reply Quote 0
                                  • P
                                    PeterJones @Richard Howard
                                    last edited by Oct 15, 2021, 2:32 PM

                                    @Richard-Howard said in Replace xml tagging:

                                    @Richard-Howard
                                    Guy, I’d like to credit your contribution. Is there a mechanism for that?
                                    You helped me save hours in conversion time!

                                    Clicking the little ^ by each post will “upvote” the post, giving the author of that post an extra “reputation point”.
                                    672a34f4-93c3-4f56-8e97-153229ea9601-image.png

                                    Other than that, there’s no other “credit” mechanism or “report outstanding help” in the forum.

                                    But being polite and saying “thank you” (as you have done) are appreciated – probably more than upvotes. :-) (At least, that’s true for me, and I assume for @guy038 as well.)

                                    R 1 Reply Last reply Oct 15, 2021, 2:34 PM Reply Quote 2
                                    • R
                                      Richard Howard @PeterJones
                                      last edited by Oct 15, 2021, 2:34 PM

                                      @PeterJones
                                      Thanks Peter. I’m starting to get the hang of it ( I hope :) ).

                                      1 Reply Last reply Reply Quote 0
                                      • P
                                        PeterJones @Richard Howard
                                        last edited by Oct 15, 2021, 2:36 PM

                                        @Richard-Howard

                                        I’m starting to get the hang of it ( I hope :) ).

                                        8c6c8f1b-419f-442f-be6e-48912be41cc0-image.png

                                        It worked. Good job. :-)

                                        And a note: one of the things I like about this site, compared to certain stack-based exchange forums, is that this site doesn’t limit you to “one right answer”; many times, there are multiple posts in a discussion that help lead to a final answer, and I like being able to reward them all.

                                        1 Reply Last reply Reply Quote 2
                                        9 out of 19
                                        • First post
                                          9/19
                                          Last post
                                        The Community of users of the Notepad++ text editor.
                                        Powered by NodeBB | Contributors