Community
    • Login

    Copying content of an Excel column to XML in Notepad++

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    20 Posts 2 Posters 14.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Claudia FrankC
      Claudia Frank @Bart Hoogland
      last edited by

      Hi Bart,

      yes, if it can be converted to a csv like style, (actually I think comma separated isn’t good in this
      case as a comment might have a comma inside, so maybe convert to |(pipe) or ;; two semicolons etc…, something unique ) and writing a script could maybe completely automate generation of the required xml or help to do the main part of creation.

      But can only say for sure if we know the details.

      Cheers
      Claudia

      1 Reply Last reply Reply Quote 0
      • Bart HooglandB
        Bart Hoogland
        last edited by

        Hi Claudia,

        Can you specify what kind of details you would require? Perhaps I could send you some files by e-mail and explain exactly what I’m trying to achieve?

        I’m not familiar with writing scripts to automate generation either, hopefully you can assist me with this too or direct me to a helpful resource :)

        Thanks again!

        Bart

        Claudia FrankC 1 Reply Last reply Reply Quote 0
        • Claudia FrankC
          Claudia Frank @Bart Hoogland
          last edited by Claudia Frank

          Hello Bart,

          doing it offline isn’t what I want to do as the forum should benefit from it as well.
          Of course I will offer my help creating such a script if
          a) it can be done with a simple script
          b) you are willing to test extensively and
          c) at the end it is up to you to decide if you want to use such a script.

          I do understand that your primary goal is to create a xml file, correct?
          If so, how many sources are involved? I mean, is the Excel sheet the only source of data
          to create the xml or are other sources needed?

          How do you create the xml at the moment? Manually?

          If you export the excel sheet as a | (pipe) separated list (like described here), can npp+ open and display it correctly?

          If the excel sheet is the only source of data, which column represents which tag/attribute in the xml?

          Cheers
          Claudia

          1 Reply Last reply Reply Quote 0
          • Bart HooglandB
            Bart Hoogland
            last edited by Bart Hoogland

            Hi Claudia,

            Ah, I fully understand. Let’s try to solve it online then! :)

            Yep I’m trying to create an .xml file. Only one source document is involved (the Excel sheet). I’m creating it manually at the moment.

            The .xml I’m trying to create can be found here: https://developers.google.com/product-review-feeds/. The main thing I want to do is use an Excel file to build the .xml structure as described on that Google page. Hopefully it can be done.

            Initially I tried converting my Excel file into an .xml, however that gave me some complications with the parent-child relationships in the feed (apparently Excel doesn’t support these). So I’ve worked the other way around, creating an .xml file from scratch and using another program (Liquid Studio, which trial has now run out) to bulk edit the column’s contents.

            Since I don’t have an Excel file I converted into .xml (like usual), I’ve tried opening the .xml file in Excel and export is as a pipe separated list. I’m guessing it looks okay?

            Claudia FrankC 1 Reply Last reply Reply Quote 0
            • Claudia FrankC
              Claudia Frank @Bart Hoogland
              last edited by

              Hello Bart,

              ok, just in case that you haven’t installed python script plugin yet, please download the msi package from here and install it.
              It is available via plugin manager as well but it has been reported that it fails to completely install too often.

              Once python script plugin has been installed use Plugins->Python Script->Show Console to see if it is working properly.
              On the bottom of the console there is a textbox, please put in the following code and press run.

              notepad.new()
              

              If a new document has opened we can go on with the script. Goto Plugins->Python Script->New script and give it a meaningful name.
              Save it.
              Put the following content into the newly created document.

              header = '<?xml version="1.0" encoding="UTF-8"?>\r\n'
              schema_versioning = '"http://www.w3.org/2007/XMLSchema-versioning"'
              schema_instance = '"http://www.w3.org/2001/XMLSchema-instance"'
              schema_location = '"http://www.google.com/shopping/reviews/schema/product/2.1/product_reviews.xsd"'
              feed = '<feed xmlns:vc={0} xmlns:xsi={1} xsi:noNamespaceSchemaLocation={2}>\r\n'.format(schema_versioning,schema_instance,schema_location)
                  
              publisher_tag = '''
                  <publisher>
                      <name>{0}</name>
                  </publisher>
              '''
              
              review_tag = '''
                      <review>
                          <reviewer>
                              <name>{0}</name>
                          </reviewer>
                          <review_timestamp>{1}</review_timestamp>
                          <content>{2}</content>
                          <review_url type="group">{3}</review_url>
                          <ratings>
                              <overall min="1" max="5">{4}</overall>
                          </ratings>
                          <products>
                              <product>
                                  <product_ids>
                                      <gtins>
                                          <gtin>{5}</gtin>
                                      </gtins>
                                  </product_ids>
                                  <product_name>{6}</product_name>
                                  <product_url>{7}</product_url>
                              </product>
                          </products>
                      </review>
              '''
              
              number_of_parts = 9
              
              
              publisher = ''
              reviews = ''
              
              def get_review_parts(content,linenumber,total):
                  global reviews
                  global publisher
                  if linenumber > 0: # to skip header line
                      parts=content.strip().split("|")
                      if number_of_parts != len(parts):
                          if not content.strip().isspace() and (len(content.strip()) > 0) :
                              console.write('ERROR: Please check line:{}\n'.format(linenumber+1))
                      else:
                          if publisher == '':
                              publisher = publisher_tag.format(parts[0])
                          reviews += review_tag.format(*parts[1:number_of_parts+1])
              
                      
              
              editor.forEachLine(get_review_parts)
              
              notepad.new()
              
              editor.addText(header)
              editor.addText(feed)
              editor.addText(publisher)
              editor.addText('    <reviews>\r\n')
              editor.addText(reviews)
              editor.addText('    </reviews>\r\n')
              editor.addText('</feed>')
              notepad.saveAs('c:\\review.xml')
              
              # do some checks
              # notepad.runPluginCommand('XML Tools', 'Check XML syntax now')
              # notepad.runPluginCommand('XML Tools', 'Validate now')
              

              Save it -> Done (Don’t reformat the code - python might not like it)

              As a test for this script use the following content in a newly created document (saving the file is not needed)

              name0|name1|name2|name3|name4|name5|name6|name7|name8
              Re-Age.nl|ReviewerName|17:00|Content Text|url adress|myrating|gtin value|productname value|product_url_value
              Re-Age.nl|Guido|12:00|Creator of Python|www.python.org|5|0123456789|python|www.python.org
              Re-Age.nl|Raymond|19:00|Core python developer|python at rcn dot com|5||PYTHON|WWW.PYTHON.ORG
              

              Having this newly created file open goto Plugins->Python Script->Scripts and click on your created script.
              A new doc should open with the newly created xml.

              As you see in the script there are two lines commented

              # notepad.runPluginCommand('XML Tools', 'Check XML syntax now')
              # notepad.runPluginCommand('XML Tools', 'Validate now')
              

              If you haven’t installed the xml tools plugin already, I would recommend you do this via the plugin manager
              and then uncomment this two lines by selecting them and right-click use Block uncomment.
              This should result in

              notepad.runPluginCommand('XML Tools', 'Check XML syntax now')
              notepad.runPluginCommand('XML Tools', 'Validate now')
              

              Save it, go back to the test document and rerun the script.
              Once the newly created document has been opened you should see a message box telling you whether there are
              problems with the syntax or not. Once confirmed the validation will take place - which in this cases fails as the content
              of the test doc isn’t compliant to the schema files.

              If this all works - we can go on finalizing the script by identifying which column must be used for which part.
              Therefore it would be good to have one line (obviously not the header) from the excel sheet
              and the resulting review tag from your manually created xml.

              How does it work, basically we replace the numbers encased with the curly braces within the predefined tags (publisher_tag, review_tag)
              with the real content of the excel sheet file.

              I could have used different techniques to create/modify the resulting xml, but I thought this one is the easiest to understand.

              Cheers
              Claudia

              1 Reply Last reply Reply Quote 0
              • Bart HooglandB
                Bart Hoogland
                last edited by

                Hi Claudia,

                Thanks so much for the extensive reply. Unfortunately I’m already a little bit stuck at the first step. I’ve downloaded and installed the MSI package, however it is not showing up as a plugin under the ‘Plugins’ dropdown menu. Only Converter and MIME Tools are listed there. Even restarted my computer but to no avail. Any idea what’s up?

                Regards,

                Bart

                Claudia FrankC 1 Reply Last reply Reply Quote 0
                • Claudia FrankC
                  Claudia Frank @Bart Hoogland
                  last edited by

                  @Bart-Hoogland

                  could it be that you are using notepad++ 64 bit version??
                  Currently, python script is only available as a 32bit plugin.

                  If you are using 32bit, can you please post the debug info?
                  Available under the ? menu.

                  Cheers
                  Claudia

                  1 Reply Last reply Reply Quote 0
                  • Bart HooglandB
                    Bart Hoogland
                    last edited by

                    Hi Claudia,

                    That was indeed the problem, it works fine in the 32bit version.

                    This is awesome. I’ve followed all your steps and it created the .XML flawlessly (outside of the errors in the validation you mentioned there would be).

                    Not sure what format you would like to have the first line from the Excel sheet and the review tag from the xml file, so I’ve just copy pasted both here. Please let me know if you want these in another format!

                    First two Excel rows (added the header, row 1, for reference):

                    ROW 1: publisher name review_timestamp content review_url overall gtin product_name product_url
                    ROW 2: Re-Age.nl Arlo Vlaar 2016-06-16T00:00:00Z Wat een super product ik als restaurant eigenaar ben hier blij mee. Ik kan gewoon een wijntje drinken en voel me de volgende dag weer top! Ik heb veel minder last van een kater gevoel: echt een aanrader. https://www.re-age.nl/liver-love/tab-review 5 8719324147040 Liver Love https://www.re-age.nl/liver-love

                    And how it looks in my xml file:

                        <review>
                            <reviewer>
                                <name>Arlo Vlaar</name>
                            </reviewer>
                            <review_timestamp>2016-06-16T00:00:00Z</review_timestamp>
                            <content>Wat een super product ik als restaurant eigenaar ben hier blij mee. Ik kan gewoon een wijntje drinken en voel me de volgende dag weer top! Ik heb veel minder last van een kater gevoel: echt een aanrader.</content>
                            <review_url type="group">https://www.re-age.nl/liver-love/tab-review</review_url>
                            <ratings>
                                <overall min="1" max="5">5</overall>
                            </ratings>
                            <products>
                                <product>
                                    <product_ids>
                                        <gtins>
                                            <gtin>8719324147040</gtin>
                                        </gtins>
                                    </product_ids>
                                    <product_name>Liver Love</product_name>
                            <product_url>https://www.re-age.nl/liver-love</product_url></product></products>
                        </review>
                    

                    Once again, thank you so much for the big help!

                    Claudia FrankC 1 Reply Last reply Reply Quote 1
                    • Claudia FrankC
                      Claudia Frank @Bart Hoogland
                      last edited by

                      Hi Bart,

                      looks like the script is already ready to run.
                      My test data layout and yours seem to be the same.
                      May you give it a try with the real data?
                      If xml gets created, the two checks should pass without error.

                      Cheers
                      Claudia

                      1 Reply Last reply Reply Quote 0
                      • Bart HooglandB
                        Bart Hoogland
                        last edited by

                        Hi Claudia,

                        Works perfectly, except for one final small hiccup. It appears that it doesn’t know how to handle the special character ‘é’ as it gives an XML Parsing error for the line containing this character. Any workaround for this?

                        Also there’s a blank line between each separate review, which I guess you did on purpose? I like it as it really increases the readability of the XML file.

                        Thanks!

                        Claudia FrankC 1 Reply Last reply Reply Quote 1
                        • Claudia FrankC
                          Claudia Frank @Bart Hoogland
                          last edited by

                          Hello Bart,

                          does this message appear in the python script console or does it happen
                          when checking xml syntax or xml validation?
                          If it appears in the console, can post the output?

                          Cheers
                          Claudia

                          1 Reply Last reply Reply Quote 0
                          • Claudia FrankC
                            Claudia Frank
                            last edited by

                            OK, was able to simulate this issue - it looks like your
                            excel sheet export isn’t utf-8 encoded, could this be?
                            It has to be utf-8 encoded.

                            Cheers
                            Claudia

                            1 Reply Last reply Reply Quote 0
                            • Bart HooglandB
                              Bart Hoogland
                              last edited by

                              That was indeed the issue Claudia! I just ran the script again and it gave me no errors and the XML structure appears to be valid. Thank you so much, you’re a lifesaver.

                              Regarding the script, say my Excel sheet contains a new column so I need to add in an extra element in the review tag. Let’s say under the last element (product_url {7}). Would I just be able to add in an extra line in the script with {8} and change number_of_parts from 9 into 10 to make it work?

                              Regards,

                              Bart

                              1 Reply Last reply Reply Quote 0
                              • Claudia FrankC
                                Claudia Frank
                                last edited by Claudia Frank

                                In theory correct Bart,

                                one improvement, let the script count the parts from the templates.
                                Instead of

                                number_of_parts = 9
                                

                                we can use

                                number_of_parts = review_tag.count('{') + publisher_tag.count('{')
                                

                                The critical part is this line

                                reviews += review_tag.format(*parts[1:number_of_parts+1])
                                

                                As long as the placeholders (the {NUMBER}) are a ascending sequence it should work,
                                as the code line does the following

                                get the template review_tag string and use format funtion to replace the placeholders with the
                                items, starting from 1 (which is basically the second item as it starts counting from 0) to the
                                number_of_parts plus one (as python does not include the end item) of the list parts.

                                If you need more control because you want to be able assign every item independently we need to do something like

                                reviews += review_tag.format(parts[1],parts[3],parts[6],parts[2],...)
                                

                                which means, placeholder 0 from review_tag would be replaced with the second item of parts,
                                placeholder 1 with the fourth item, placeholder 2 with the seventh item, placeholder 3 with the third item and so on)

                                Cheers
                                Claudia

                                1 Reply Last reply Reply Quote 0
                                • Bart HooglandB
                                  Bart Hoogland
                                  last edited by

                                  Hi Claudia,

                                  Cool, that really is an improvement. I’ve replaced the number_of_parts = 9 piece with the line you suggested. So in the future I will only have to worry about changing the numbers between the curly brackets and make sure they are (still) in an ascending order.

                                  The alternative reviews += bit seems a little unnecessarily complicated for now, so I’ll just make sure I put the numbers in the correct order :)

                                  Regards,

                                  Bart

                                  1 Reply Last reply Reply Quote 1
                                  • Claudia FrankC
                                    Claudia Frank
                                    last edited by

                                    Hello Bart,

                                    after rereading my previous answer I have to correct myself.

                                    this

                                    As long as the placeholders (the {NUMBER}) are a ascending sequence it should work,

                                    is not strictly true. You can use the curly braces without a number, then it is up to python
                                    to replace each placeholder with the corresponding item.

                                    Means, we could have written

                                            <reviewer>
                                                <name>{}</name>
                                            </reviewer>
                                            <review_timestamp>{}</review_timestamp>
                                            <content>{}</content>
                                            <review_url type="group">{}</review_url>
                                            <ratings>
                                                <overall min="1" max="5">{}</overall>
                                            </ratings>
                                            ...
                                    

                                    This will also work but, especially for python beginners, I wouldn’t recommend it,
                                    because, from my point of view, it makes it less readable and you loose the functionality
                                    to use the alternative creation at all.

                                    Cheers
                                    Claudia

                                    1 Reply Last reply Reply Quote 0
                                    • Bart HooglandB
                                      Bart Hoogland
                                      last edited by

                                      Hi Claudia,

                                      I’ll keep it as it is then :). Cheers!

                                      1 Reply Last reply Reply Quote 0
                                      • First post
                                        Last post
                                      The Community of users of the Notepad++ text editor.
                                      Powered by NodeBB | Contributors