Search multiple words in xml



  • Hi all,
    i got xml file with 2600000 lines. In this file describes a number of products. Each of the products is described as follows:

    <PRODUCT mode=“new”>
    <SUPPLIER_PID>285129</SUPPLIER_PID>
    <PRODUCT_DETAILS>
    <DESCRIPTION_SHORT lang=“pol”></DESCRIPTION_SHORT>
    <DESCRIPTION_LONG lang=“pol”></DESCRIPTION_LONG>
    <EAN></EAN>
    <SUPPLIER_ALT_PID></SUPPLIER_ALT_PID>
    <MANUFACTURER_PID></MANUFACTURER_PID>
    <MANUFACTURER_NAME></MANUFACTURER_NAME>
    <MANUFACTURER_TYPE_DESCR></MANUFACTURER_TYPE_DESCR>
    <SPECIAL_TREATMENT_CLASS type=“NOT_RELEVANT”>NONE</SPECIAL_TREATMENT_CLASS>
    <KEYWORD lang=“pol”></KEYWORD>
    </PRODUCT_DETAILS>
    <PRODUCT_ORDER_DETAILS>
    <ORDER_UNIT>C62</ORDER_UNIT>
    <CONTENT_UNIT>C62</CONTENT_UNIT>
    <NO_CU_PER_OU>1</NO_CU_PER_OU>
    <PRICE_QUANTITY>1</PRICE_QUANTITY>
    <QUANTITY_MIN>1</QUANTITY_MIN>
    <QUANTITY_INTERVAL>1</QUANTITY_INTERVAL>
    </PRODUCT_ORDER_DETAILS>
    <PRODUCT_PRICE_DETAILS>
    <DATETIME>
    <DATE>2016-01-26</DATE>
    </DATETIME>
    <PRODUCT_PRICE>
    <PRICE_AMOUNT></PRICE_AMOUNT>
    <PRICE_CURRENCY>EUR</PRICE_CURRENCY>
    <TAX>0.19</TAX>
    <LOWER_BOUND>1</LOWER_BOUND>
    </PRODUCT_PRICE>
    </PRODUCT_PRICE_DETAILS>
    </PRODUCT>

    In line <SUPPLIER_PID> 285129 </ SUPPLIER_PID> is given No. of the product. I need an easy way to find hundreds of No. of the product in this file and remove all lines on this (all that is between the <PRODUCT mode = “new”> and </ PRODUCT>). In my xml file is not repeated No. of products so I want to do it automatically.

    Is there any way of doing this?



  • Not sure if I got this right: You are trying to remove PRODUCT Tags for a specific SUPPLIER_PID? If so try this:

    • Go to Search->Replace
    • Search for <PRODUCT mode="new">\R<SUPPLIER_PID>285129</SUPPLIER_PID>.*?</PRODUCT>
    • Replace with nothing
    • Select Regular Expressions. Make sure “. matches \r and \n” is checked
    • Hit “Replace all”

    But if you have to do this kind of job on a regular basis, you may want to look for a Tool that is more specifically made for manipulation of XML by XPath.


Log in to reply