XML - How to sort based on <id>, automatically calculated by formula

Tư Mã Tần Quảng

Hello, I want to quickly rearrange the order of the content in my XML file by <id>, which is currently being confused, for example, below:

<?xml version="1.0" encoding="utf-8"?>
<include>
	<data>
		<id>1</id>
		<name>John Smith</name>
		<type>Intern</type>
		<month>3</month>
		<bsalary>50$</bsalary>
		<asalary>78$</asalary>
		<msalary>100$</msalary>
	</data>
	<data>
		<id>3</id>
		<name>Granny</name>
		<type>Staff</type>
		<month>12</month>
		<bsalary>400$</bsalary>
		<asalary>648$</asalary>
		<msalary>800$</msalary>
	</data>
	<data>
		<id>2</id>
		<name>Rino Evo</name>
		<type>Employee</type>
		<month>7</month>
		<bsalary>300$</bsalary>
		<asalary>480$</asalary>
		<msalary>600$</msalary>
	</data>
...(truncate)...
	<data>
		<id>XXX (many)</id>
		<name>XXXXXXX</name>
		<type>XX</type>
		<month>XX</month>
		<bsalary>XXX$</bsalary>
		<asalary>XXX$</asalary>
		<msalary>XXX$</msalary>
	</data>
</include>

Now I want to rearrange all the data by <id>, in ascending order (1, 2, 3, …). Is there any way to implement it?
Also, I want to quickly change the values of bsalary, asalary, msalary, … according to a given formula. For example:

New <bsalary> = month * 125.67 WHERE <type> = Intern AND month >= 3
New <bsalary> = month * 145.33 WHERE <type> = Staff  AND month >= 3
New <bsalary> = month * 169.69% WHERE <type> =  Employee AND month >= 3
New <asalary> = month * 188.28 / 235.49 / 264.67% WHERE <type> = Intern / Staff / Employee  AND month >= 6 (as above, but I write briefly, but if it is possible to quickly change all 3 at the same time, it's very good)
New <msalary> = month * 245.45 / 306.11 / 384.32% WHERE <type> = Intern / Staff / Employee  AND month >= 12 (as above)

My English is very bad, I hope you can help me.
Thank you!

Ekopalypse

@Tư-Mã-Tần-Quảng

I have no experience doing this but I got the impression that this might be possible by using XSLT like described here.

Sofistanpp

Hi @Tư-Mã-Tần-Quảng

Notepad++ has a sort feature, but can’t directly sort the quoted format. So, as a workaround, the idea is to reformat that text in order to be able to apply the kind of sort provided by Notepad++. This is doable via a couple of search and replace in regex mode.

As a proof of concept, copy and place the following input data in a new window:

	<data>
		<id>1</id>
		<name>John Smith</name>
		<type>Intern</type>
		<month>3</month>
		<bsalary>50$</bsalary>
		<asalary>78$</asalary>
		<msalary>100$</msalary>
	</data>
	<data>
		<id>3</id>
		<name>Granny</name>
		<type>Staff</type>
		<month>12</month>
		<bsalary>400$</bsalary>
		<asalary>648$</asalary>
		<msalary>800$</msalary>
	</data>
	<data>
		<id>5</id>
		<name>Granny</name>
		<type>Staff</type>
		<month>12</month>
		<bsalary>400$</bsalary>
		<asalary>648$</asalary>
		<msalary>800$</msalary>
	</data>
	<data>
		<id>4</id>
		<name>Granny</name>
		<type>Staff</type>
		<month>12</month>
		<bsalary>400$</bsalary>
		<asalary>648$</asalary>
		<msalary>800$</msalary>
	</data>
	<data>
		<id>2</id>
		<name>Granny</name>
		<type>Staff</type>
		<month>12</month>
		<bsalary>400$</bsalary>
		<asalary>648$</asalary>
		<msalary>800$</msalary>
	</data>

Then, taking care to put the caret at the very beginning of the document, apply the following regex (all options deselected, except the regex mode button):

Search:		(?s)^\t<data>\R\t{2}<id>(\d+)</id>.*?\t</data>\R
Replace:	$1$0

The regex places a copy of the id number before the data label:

1	<data>
		<id>1</id>
		<name>John Smith</name>
		<type>Intern</type>
		<month>3</month>
		<bsalary>50$</bsalary>
		<asalary>78$</asalary>
		<msalary>100$</msalary>
	</data>
3	<data>
		<id>3</id>

etc.

Now, the idea is to replace all the CRs with a character not present in the input data —I will use a bullet (•), but you can use whatever you like— in order to convert each multilineal data label in an single line.

Apply the following regex under the same conditions as before:

Search:		(?s)\R(?!\d)
Replace:	•

Output:

1	<data>•		<id>1</id>•		<name>John Smith</name>•		<type>Intern</type>•		<month>3</month>•		<bsalary>50$</bsalary>•		<asalary>78$</asalary>•		<msalary>100$</msalary>•	</data>
3	<data>•		<id>3</id>•		<name>Granny</name>•		<type>Staff</type>•		<month>12</month>•		<bsalary>400$</bsalary>•		<asalary>648$</asalary>•		<msalary>800$</msalary>•	</data>

etc.

Note: The precedent regexes can be combined in a single one, though the result is not as nice as the others:

Search:		(?s)(.+?)\R(\t{2}<id>(\d+?)</id>)\R(.+?)\R(.+?)\R(.+?)\R(.+?)\R(.+?)\R(.+?)\R(.+?)\R
Replace:	$3$1•$2•$4•$5•$6•$7•$8•$9•$10\r\n

Once you are here, proceed to sort the result lines via the Edit menu.

To finish the task, we have to get the data text to its original format preserving, of course, the new order. A simple regex will do that:

Search:		(?-s)(•)|(^\d+)
Replace:	?1\r\n:

Output:

1	<data>
		<id>1</id>
		<name>John Smith</name>
		<type>Intern</type>
		<month>3</month>
		<bsalary>50$</bsalary>
		<asalary>78$</asalary>
		<msalary>100$</msalary>
	</data>
2	<data>
		<id>2</id>
		<name>Granny</name>
		<type>Staff</type>
		<month>12</month>
		<bsalary>400$</bsalary>
		<asalary>648$</asalary>
		<msalary>800$</msalary>
	</data>
3	<data>
		<id>3</id>
		<name>Granny</name>
		<type>Staff</type>
		<month>12</month>
		<bsalary>400$</bsalary>
		<asalary>648$</asalary>
		<msalary>800$</msalary>
	</data>
4	<data>
		<id>4</id>
		<name>Granny</name>
		<type>Staff</type>
		<month>12</month>
		<bsalary>400$</bsalary>
		<asalary>648$</asalary>
		<msalary>800$</msalary>
	</data>
5	<data>
		<id>5</id>
		<name>Granny</name>
		<type>Staff</type>
		<month>12</month>
		<bsalary>400$</bsalary>
		<asalary>648$</asalary>
		<msalary>800$</msalary>
	</data>

That’s all. Hope you find it useful.

Alan Kilborn

@Sofistanpp said in XML - How to sort based on <id>, automatically calculated by formula:

?1\r\n:

Good solution. The final replace value threw me; it does indeed work but I would be more familiar with it if it were written (?1\r\n) which of course also works.

Are you going to tackle this part of the OP’s request next? :-)

Also, I want to quickly change the values of bsalary, asalary, msalary, … according to a given formula.

Sofistanpp

Hi @Alan-Kilborn

Thank you. The expression you mentioned is the one I usually use in case of an alternate search replacement, and didn’t think too much when times comes. If it works, it’s fine.

Concerning your last question, no, ha ha, I am not going to even look at that request. I considered the sort issue because Notepad++ has a set of commands that can deal with it, so the question is on topic. This is not the case with the calcs requested. I know you know that Notepad++ lacks mathematical features, so it is easier to script those calcs. Regex can simulate some basic mathematical operations via pre-built tables for adding, subtracting, counting, but at first sight it seems to me that in this case this approach is not applicable.

Maybe I’m wrong and someone else shows us the way to regex heavens.

Best Regards.

guy038

Hello, @tư-mã-tần-quảng, @Ekopalypse, @sofistanpp, @alan-kilborn,

@sofistanpp, your idea of using a dummy character ( • ) to change a multi-line area in a a list of single lines is quite pertinent !

Now, allow me to tell you that your 3 regex S/R may, even, be shortened !

Regarding the first S/R, it’s almost obvious. Just give it a try !
- SEARCH ^\h*<data>\R\h+<id>(\d+)
- REPLACE $1$0
Regarding the second S/R, the in-line modifier (?s) is useless, too because your regex does not contain any dot char ., anyway ! So :
- SEARCH \R(?!\d)
- REPLACE •
Regarding the third S/R :
- You do not need to create group(s), which are not used in replacement or, further on, in the search regex !
- The in-line modifier (?-s) is useless, for the same reason as before
- The general syntax of a conditional replacement is (?{###}Part IF group ### EXIST:Part IF group ### does NOT exist)
  - The { and } brace symbols are not mandatory if the part THEN does not begin with a number
  - The two parts THEN and ELSE are separated with a colon : symbol
  - If no part ELSE exists, the colon is not necessary
  - If part THEN and/or ELSE contain literal ( and/or ) parentheses, they must be escaped with a \ backslash character, so the syntaxes $ and $
  - If only 1 conditional replacement occurs, in replacement, you may omit the starting and ending parentheses !

So, the following syntax is enough :

SEARCH ^\d+|(•)
REPLACE ?1\r\n

Best Regards,

guy038

P.S. :

As @sofistanpp said, the type of calculations, with conditions, wanted by @tư-mã-tần-quảng, are quite inaccessible with regular expressions and only achievable by a scripting language ;-))

dinkumoil

@Tư-Mã-Tần-Quảng

You could install the XML Tools plugin and use its XSLT Transformation feature.

Open Plugins Admin and install XML Tools plugin.
After Notepad++ has been restarted paste the XSLT code from below to an empty tab and save it for example as Transform.xsl.
Open the XML file you want to process.
Go to (menu) Plugins -> XML Tools -> XSLT Transformation.
In the dialog popping up click on the ellipsis button and in the file selector dialog popping up select the XSL file you created in step 2.
Click the Transform button.
A new document will be opened with your changes applied. Save it to the original file.

In the following the XSLT code to do your requested transformations. I guess you want to increase the value of the asalary, bsalary and msalary tags by a percentage value, thus I divided the factors you provided by 100.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output
    method="xml"
    indent="yes"
    encoding="utf-8"
    omit-xml-declaration="no"
  />

  <!-- Copy all nodes not affected by the rules below -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- Sort based on id node -->
  <xsl:template match="include">
    <xsl:copy>
      <xsl:apply-templates select="data">
        <xsl:sort select="id"/>
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>

  <!-- Recalculate bsalary node -->
  <xsl:template match="/include/data/bsalary">
    <xsl:choose>
      <xsl:when test="../type[text()]='Intern' and ../month[text()]>=3">
        <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <xsl:value-of select="substring(text(), 1, string-length(text()) - 1) * 1.2567"/>
          <xsl:text>$</xsl:text>
        </xsl:copy>
      </xsl:when>

      <xsl:when test="../type[text()]='Staff' and ../month[text()]>=3">
        <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <xsl:value-of select="substring(text(), 1, string-length(text()) - 1) * 1.4533"/>
          <xsl:text>$</xsl:text>
        </xsl:copy>
      </xsl:when>

      <xsl:when test="../type[text()]='Employee' and ../month[text()]>=3">
        <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <xsl:value-of select="substring(text(), 1, string-length(text()) - 1) * 1.6969"/>
          <xsl:text>$</xsl:text>
        </xsl:copy>
      </xsl:when>

      <!-- Copy all nodes not affected by the rules above -->
      <xsl:otherwise>
        <xsl:copy>
          <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <!-- Recalculate asalary node -->
  <xsl:template match="/include/data/asalary">
    <xsl:choose>
      <xsl:when test="../type[text()]='Intern' and ../month[text()]>=6">
        <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <xsl:value-of select="substring(text(), 1, string-length(text()) - 1) * 1.8828"/>
          <xsl:text>$</xsl:text>
        </xsl:copy>
      </xsl:when>

      <xsl:when test="../type[text()]='Staff' and ../month[text()]>=6">
        <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <xsl:value-of select="substring(text(), 1, string-length(text()) - 1) * 2.3549"/>
          <xsl:text>$</xsl:text>
        </xsl:copy>
      </xsl:when>

      <xsl:when test="../type[text()]='Employee' and ../month[text()]>=6">
        <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <xsl:value-of select="substring(text(), 1, string-length(text()) - 1) * 2.6467"/>
          <xsl:text>$</xsl:text>
        </xsl:copy>
      </xsl:when>

      <!-- Copy all nodes not affected by the rules above -->
      <xsl:otherwise>
        <xsl:copy>
          <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <!-- Recalculate msalary node -->
  <xsl:template match="/include/data/msalary">
    <xsl:choose>
      <xsl:when test="../type[text()]='Intern' and ../month[text()]>=12">
        <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <xsl:value-of select="substring(text(), 1, string-length(text()) - 1) * 2.4545"/>
          <xsl:text>$</xsl:text>
        </xsl:copy>
      </xsl:when>

      <xsl:when test="../type[text()]='Staff' and ../month[text()]>=12">
        <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <xsl:value-of select="substring(text(), 1, string-length(text()) - 1) * 3.0611"/>
          <xsl:text>$</xsl:text>
        </xsl:copy>
      </xsl:when>

      <xsl:when test="../type[text()]='Employee' and ../month[text()]>=12">
        <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <xsl:value-of select="substring(text(), 1, string-length(text()) - 1) * 3.8432"/>
          <xsl:text>$</xsl:text>
        </xsl:copy>
      </xsl:when>

      <!-- Copy all nodes not affected by the rules above -->
      <xsl:otherwise>
        <xsl:copy>
          <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>