Deleting specific elements and children from an entire xml file
-
I have a problem slightly more complex than a recent help request entitle “Deleting specific string from entire xml file”.
I have a large xml file like “Original.gpx” below and I wish to use XML tools plugin (XSL Transformation) to generate something like “Result.gpx” also below…Original.gpx is …
<?xml version=‘1.0’ encoding=‘UTF-8’ standalone=‘yes’ ?>
<gpx version=“1.1” creator=“OsmAnd+” xmlns=“http://www.topografix.com/GPX/1/1” >
<trk>
<trkseg>
<trkpt lat=“45.4652056” lon=“-73.6961991”>
<ele>27.57</ele>
<time>2017-05-27T21:13:09Z</time>
<hdop>4.0</hdop>
<extensions>
<speed>24.5</speed>
</extensions>
</trkpt>
<trkpt lat=“45.4643226” lon=“-73.6958728”>
<ele>30.57</ele>
<time>2017-05-27T21:13:12Z</time>
<hdop>3.0</hdop>
<extensions>
<speed>25.25</speed>
</extensions>
</trkpt>
</trkseg>
</trk>
</gpx>Result.gpx is…
<?xml version=‘1.0’ encoding=‘UTF-8’ standalone=‘yes’ ?>
<gpx version=“1.1” creator=“OsmAnd+” xmlns=“http://www.topografix.com/GPX/1/1” >
<trk>
<trkseg>
<trkpt lat=“45.4652056” lon=“-73.6961991”>
<ele>27.57</ele>
<time>2017-05-27T21:13:09Z</time>
<hdop>4.0</hdop>
</trkpt>
<trkpt lat=“45.4643226” lon=“-73.6958728”>
<ele>30.57</ele>
<time>2017-05-27T21:13:12Z</time>
<hdop>3.0</hdop>
</trkpt>
</trkseg>
</trk>
</gpx>I summary I want to remove the element <extensions></extensions>, including its children (<speed></speed> or others). I used the XSL file below to transform it but did not have any success. Actually the transformation does not change anything to the file :-(
Transform.xsl file is…
<xsl:stylesheet version=“1.0” xmlns:xsl=“http://www.w3.org/1999/XSL/Transform”>
<xsl:template match=“@|node()“>
xsl:copy
<xsl:apply-templates select=”@|node()”/>
</xsl:copy>
</xsl:template>
<xsl:template match=“extension”/>
</xsl:stylesheet>Any idea on what is wrong with it?
-
Hello, @daniel-bégin,
You’re wrong, daniel :-)) Not more difficult., indeed !
Just use this search regex
(?s-i)<extensions>.+?</extensions>\R
, with the Regular expression option checked and anempty
replacement zone ! and click on the Replace All button
Notes :
What this regex means ?. Well :
-
At beginning, the modifier
-i
forces the search to be NON insensitive ( => search of the word extensions, with that exact case ) and the modifiers
means that special dot.
characters match, absolutely, any character ( Standard AND EOL characters ) -
The parts
<extensions>
and/<extensions>
, simply, match the literal strings <extensions> and </extensions> -
The regex part
.+?
, located between, matches the shortest non-empty range of any characters, between the strings <extensions> and </extensions> -
Finally, the
\R
syntax, among other characters, matches the EOL character(s) ( Windows\r\n
, or Unix\r
), located after the string /<extensions> -
As the replacement part is
empty
, all complete lines, between the strings <extensions> and </extensions>, included, are deleted
Cheers,
guy038
P.S. :
Just notice that the slightly different regex
(?s-i)<extensions>.+</extensions>\R
, without the exclamation mark, would select the longest range of characters between two strings <extensions> and </extensions>.So, this range would start at the first string <extensions> of your file and end at the last string </extensions> of your file !
-
-
Thank guy038, Clever! I’ll use it :-)
However, (in case I have a really more complex manipulation to do!-), is it possible to do the above using XSL transformation from the XML Tools plugin?