Deleting specific elements and children from an entire xml file



  • I have a problem slightly more complex than a recent help request entitle “Deleting specific string from entire xml file”.
    I have a large xml file like “Original.gpx” below and I wish to use XML tools plugin (XSL Transformation) to generate something like “Result.gpx” also below…

    Original.gpx is …
    <?xml version=‘1.0’ encoding=‘UTF-8’ standalone=‘yes’ ?>
    <gpx version=“1.1” creator=“OsmAnd+” xmlns=“http://www.topografix.com/GPX/1/1” >
    <trk>
    <trkseg>
    <trkpt lat=“45.4652056” lon="-73.6961991">
    <ele>27.57</ele>
    <time>2017-05-27T21:13:09Z</time>
    <hdop>4.0</hdop>
    <extensions>
    <speed>24.5</speed>
    </extensions>
    </trkpt>
    <trkpt lat=“45.4643226” lon="-73.6958728">
    <ele>30.57</ele>
    <time>2017-05-27T21:13:12Z</time>
    <hdop>3.0</hdop>
    <extensions>
    <speed>25.25</speed>
    </extensions>
    </trkpt>
    </trkseg>
    </trk>
    </gpx>

    Result.gpx is…
    <?xml version=‘1.0’ encoding=‘UTF-8’ standalone=‘yes’ ?>
    <gpx version=“1.1” creator=“OsmAnd+” xmlns=“http://www.topografix.com/GPX/1/1” >
    <trk>
    <trkseg>
    <trkpt lat=“45.4652056” lon="-73.6961991">
    <ele>27.57</ele>
    <time>2017-05-27T21:13:09Z</time>
    <hdop>4.0</hdop>
    </trkpt>
    <trkpt lat=“45.4643226” lon="-73.6958728">
    <ele>30.57</ele>
    <time>2017-05-27T21:13:12Z</time>
    <hdop>3.0</hdop>
    </trkpt>
    </trkseg>
    </trk>
    </gpx>

    I summary I want to remove the element <extensions></extensions>, including its children (<speed></speed> or others). I used the XSL file below to transform it but did not have any success. Actually the transformation does not change anything to the file :-(

    Transform.xsl file is…
    <xsl:stylesheet version=“1.0” xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”>
    <xsl:template match="@|node()">
    xsl:copy
    <xsl:apply-templates select="@
    |node()"/>
    </xsl:copy>
    </xsl:template>
    <xsl:template match=“extension”/>
    </xsl:stylesheet>

    Any idea on what is wrong with it?



  • Hello, @daniel-bégin,

    You’re wrong, daniel :-)) Not more difficult., indeed !

    Just use this search regex (?s-i)<extensions>.+?</extensions>\R, with the Regular expression option checked and an empty replacement zone ! and click on the Replace All button


    Notes :

    What this regex means ?. Well :

    • At beginning, the modifier -i forces the search to be NON insensitive ( => search of the word extensions, with that exact case ) and the modifier s means that special dot . characters match, absolutely, any character ( Standard AND EOL characters )

    • The parts <extensions> and /<extensions>, simply, match the literal strings <extensions> and </extensions>

    • The regex part .+?, located between, matches the shortest non-empty range of any characters, between the strings <extensions> and </extensions>

    • Finally, the \R syntax, among other characters, matches the EOL character(s) ( Windows \r\n, or Unix \r ), located after the string /<extensions>

    • As the replacement part is empty, all complete lines, between the strings <extensions> and </extensions>, included, are deleted

    Cheers,

    guy038

    P.S. :

    Just notice that the slightly different regex (?s-i)<extensions>.+</extensions>\R, without the exclamation mark, would select the longest range of characters between two strings <extensions> and </extensions>.

    So, this range would start at the first string <extensions> of your file and end at the last string </extensions> of your file !



  • Thank guy038, Clever! I’ll use it :-)

    However, (in case I have a really more complex manipulation to do!-), is it possible to do the above using XSL transformation from the XML Tools plugin?