Copying content of an Excel column to XML in Notepad++



  • @Bart-Hoogland

    could it be that you are using notepad++ 64 bit version??
    Currently, python script is only available as a 32bit plugin.

    If you are using 32bit, can you please post the debug info?
    Available under the ? menu.

    Cheers
    Claudia



  • Hi Claudia,

    That was indeed the problem, it works fine in the 32bit version.

    This is awesome. I’ve followed all your steps and it created the .XML flawlessly (outside of the errors in the validation you mentioned there would be).

    Not sure what format you would like to have the first line from the Excel sheet and the review tag from the xml file, so I’ve just copy pasted both here. Please let me know if you want these in another format!

    First two Excel rows (added the header, row 1, for reference):

    ROW 1: publisher name review_timestamp content review_url overall gtin product_name product_url
    ROW 2: Re-Age.nl Arlo Vlaar 2016-06-16T00:00:00Z Wat een super product ik als restaurant eigenaar ben hier blij mee. Ik kan gewoon een wijntje drinken en voel me de volgende dag weer top! Ik heb veel minder last van een kater gevoel: echt een aanrader. https://www.re-age.nl/liver-love/tab-review 5 8719324147040 Liver Love https://www.re-age.nl/liver-love

    And how it looks in my xml file:

        <review>
            <reviewer>
                <name>Arlo Vlaar</name>
            </reviewer>
            <review_timestamp>2016-06-16T00:00:00Z</review_timestamp>
            <content>Wat een super product ik als restaurant eigenaar ben hier blij mee. Ik kan gewoon een wijntje drinken en voel me de volgende dag weer top! Ik heb veel minder last van een kater gevoel: echt een aanrader.</content>
            <review_url type="group">https://www.re-age.nl/liver-love/tab-review</review_url>
            <ratings>
                <overall min="1" max="5">5</overall>
            </ratings>
            <products>
                <product>
                    <product_ids>
                        <gtins>
                            <gtin>8719324147040</gtin>
                        </gtins>
                    </product_ids>
                    <product_name>Liver Love</product_name>
            <product_url>https://www.re-age.nl/liver-love</product_url></product></products>
        </review>
    

    Once again, thank you so much for the big help!



  • Hi Bart,

    looks like the script is already ready to run.
    My test data layout and yours seem to be the same.
    May you give it a try with the real data?
    If xml gets created, the two checks should pass without error.

    Cheers
    Claudia



  • Hi Claudia,

    Works perfectly, except for one final small hiccup. It appears that it doesn’t know how to handle the special character ‘é’ as it gives an XML Parsing error for the line containing this character. Any workaround for this?

    Also there’s a blank line between each separate review, which I guess you did on purpose? I like it as it really increases the readability of the XML file.

    Thanks!



  • Hello Bart,

    does this message appear in the python script console or does it happen
    when checking xml syntax or xml validation?
    If it appears in the console, can post the output?

    Cheers
    Claudia



  • OK, was able to simulate this issue - it looks like your
    excel sheet export isn’t utf-8 encoded, could this be?
    It has to be utf-8 encoded.

    Cheers
    Claudia



  • That was indeed the issue Claudia! I just ran the script again and it gave me no errors and the XML structure appears to be valid. Thank you so much, you’re a lifesaver.

    Regarding the script, say my Excel sheet contains a new column so I need to add in an extra element in the review tag. Let’s say under the last element (product_url {7}). Would I just be able to add in an extra line in the script with {8} and change number_of_parts from 9 into 10 to make it work?

    Regards,

    Bart



  • In theory correct Bart,

    one improvement, let the script count the parts from the templates.
    Instead of

    number_of_parts = 9
    

    we can use

    number_of_parts = review_tag.count('{') + publisher_tag.count('{')
    

    The critical part is this line

    reviews += review_tag.format(*parts[1:number_of_parts+1])
    

    As long as the placeholders (the {NUMBER}) are a ascending sequence it should work,
    as the code line does the following

    get the template review_tag string and use format funtion to replace the placeholders with the
    items, starting from 1 (which is basically the second item as it starts counting from 0) to the
    number_of_parts plus one (as python does not include the end item) of the list parts.

    If you need more control because you want to be able assign every item independently we need to do something like

    reviews += review_tag.format(parts[1],parts[3],parts[6],parts[2],...)
    

    which means, placeholder 0 from review_tag would be replaced with the second item of parts,
    placeholder 1 with the fourth item, placeholder 2 with the seventh item, placeholder 3 with the third item and so on)

    Cheers
    Claudia



  • Hi Claudia,

    Cool, that really is an improvement. I’ve replaced the number_of_parts = 9 piece with the line you suggested. So in the future I will only have to worry about changing the numbers between the curly brackets and make sure they are (still) in an ascending order.

    The alternative reviews += bit seems a little unnecessarily complicated for now, so I’ll just make sure I put the numbers in the correct order :)

    Regards,

    Bart



  • Hello Bart,

    after rereading my previous answer I have to correct myself.

    this

    As long as the placeholders (the {NUMBER}) are a ascending sequence it should work,

    is not strictly true. You can use the curly braces without a number, then it is up to python
    to replace each placeholder with the corresponding item.

    Means, we could have written

            <reviewer>
                <name>{}</name>
            </reviewer>
            <review_timestamp>{}</review_timestamp>
            <content>{}</content>
            <review_url type="group">{}</review_url>
            <ratings>
                <overall min="1" max="5">{}</overall>
            </ratings>
            ...
    

    This will also work but, especially for python beginners, I wouldn’t recommend it,
    because, from my point of view, it makes it less readable and you loose the functionality
    to use the alternative creation at all.

    Cheers
    Claudia



  • Hi Claudia,

    I’ll keep it as it is then :). Cheers!


Log in to reply