Community
    • Login

    Regex to replace XML settings file tags to INI-like config param and value

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 2 Posters 277 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • gamophyteG
      gamophyte
      last edited by gamophyte

      Hello everyone,

      Could you please help me the the following search-and-replace problem I am having?

      I am taking XML based configuration files, and converting sections of them to something that resembles an INI file data structure.

      I found an online converter but I would like to keep within my workspace of Notepadd++ as I select text and process as I go.

      Here is the data I currently have (“before” data):

        <time_date>    
              <date_format>DD/MM/YY</date_format>    
              <hr24_clock>1</hr24_clock>    
              <ntp_dhcp_option>0</ntp_dhcp_option>    
              <ntp_server>1</ntp_server>    
              <ntp_server_addr>time1.google.com</ntp_server_addr>    
              <ntp_server_update_interval>1000</ntp_server_update_interval>    
              <timezone_dhcp_option>0</timezone_dhcp_option>    
              <selected_timezone>America/New_York</selected_timezone>
        </time_date>
      

      Here is how I would like that data to look (“after” data):

      time_date.date_format = DD/MM/YY
      time_date.hr24_clock = 1
      time_date.ntp_dhcp_option = 0
      time_date.ntp_server = 1
      time_date.ntp_server_addr = time1.google.com
      time_date.ntp_server_update_interval = 1000
      time_date.timezone_dhcp_option = 0
      time_date.selected_timezone = America/New_York
      
      KEY POINTS:
      ● The XML section becomes front loaded on the param.  
      ● There must be a space before, and after, the equals symbol.
      

      Thank you!

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by

        Hello, @gamophyte and All,

        I suppose that the following regex S/R should just meet your goal !


        So, starting with the text, below, where I’ve, intentionally, added / deleted empty lines :

          <time_date>    
                <date_format>DD/MM/YY</date_format>    
                <hr24_clock>1</hr24_clock>    
        
                <ntp_dhcp_option>0</ntp_dhcp_option>    
                <ntp_server>1</ntp_server>    
                <ntp_server_addr>time1.google.com</ntp_server_addr>    
                <ntp_server_update_interval>1000</ntp_server_update_interval>    
                <timezone_dhcp_option>0</timezone_dhcp_option>    
                <selected_timezone>America/New_York</selected_timezone>
        
          </time_date>
        
        
        
          <test_one>    
                <date_format>DD/MM/YY</date_format>    
                <hr24_clock>1</hr24_clock>    
                <ntp_dhcp_option>0</ntp_dhcp_option>    
                <ntp_server>1</ntp_server>    
                <ntp_server_addr>time1.google.com</ntp_server_addr>    
                <ntp_server_update_interval>1000</ntp_server_update_interval>    
        
        
        
                <timezone_dhcp_option>0</timezone_dhcp_option>    
                <selected_timezone>America/New_York</selected_timezone>
          </test_one>
          <test_two>    
                <date_format>DD/MM/YY</date_format>    
                <hr24_clock>1</hr24_clock>    
                <ntp_dhcp_option>0</ntp_dhcp_option>    
                <ntp_server>1</ntp_server>    
                <ntp_server_addr>time1.google.com</ntp_server_addr>    
                <ntp_server_update_interval>1000</ntp_server_update_interval>    
                <timezone_dhcp_option>0</timezone_dhcp_option>    
                <selected_timezone>America/New_York</selected_timezone>
          </test_two>
        
        • Open your “source” file in N++, or, preferably, the text above in a new tab, for testing !

        • Open the Replace dialog ( Ctrl + H )

        • Unckeck all box options

        • Check the Wrap around option

        • Select the Regular expression seach mode

        SEARCH (?-is)^\h+<(.+)>(.+)</.+>\h*\R+(?=(?s:.+?)^\h+</(.+)>$)|^\h+<(/)?.+>\h*\R+

        REPLACE (?1$3.$1 = $2\r\n:(?4\r\n:)

        • Click once only the Replace All button

        => You should get your expected OUTPUT text :

        time_date.date_format = DD/MM/YY
        time_date.hr24_clock = 1
        time_date.ntp_dhcp_option = 0
        time_date.ntp_server = 1
        time_date.ntp_server_addr = time1.google.com
        time_date.ntp_server_update_interval = 1000
        time_date.timezone_dhcp_option = 0
        test_one.selected_timezone = America/New_York
        
        test_one.date_format = DD/MM/YY
        test_one.hr24_clock = 1
        test_one.ntp_dhcp_option = 0
        test_one.ntp_server = 1
        test_one.ntp_server_addr = time1.google.com
        test_one.ntp_server_update_interval = 1000
        test_one.timezone_dhcp_option = 0
        test_two.selected_timezone = America/New_York
        
        test_two.date_format = DD/MM/YY
        test_two.hr24_clock = 1
        test_two.ntp_dhcp_option = 0
        test_two.ntp_server = 1
        test_two.ntp_server_addr = time1.google.com
        test_two.ntp_server_update_interval = 1000
        test_two.timezone_dhcp_option = 0
        
        

        Notes :

        • Within the OUTPUT text, the first part before the dot is based on each corresponding </....> closing tag, within the INPUT text

        • Empty lines, between two items of a section, are simply ignored

        • Additional empty lines or missing empty line separator, between two sections, are normalized to a single empty line

        Best Regards,

        guy038

        gamophyteG 1 Reply Last reply Reply Quote 5
        • gamophyteG
          gamophyte @guy038
          last edited by

          @guy038 AMAZING!

          However I realize I can’t use this wholesale on the whole document. I’m finding that some settings aren’t within a section after all.

          In those cases, it appears to grab a value from a neighbor and uses that as the section name.

          It’s no issue just to select each section manually, and so I will just need another Regex for when the section is flat.

          Like just below in our running example:

                  <date_format>DD/MM/YY</date_format>    
                  <hr24_clock>1</hr24_clock>    
                  <ntp_dhcp_option>0</ntp_dhcp_option>    
                  <ntp_server>1</ntp_server>    
                  <ntp_server_addr>time1.google.com</ntp_server_addr>    
                  <ntp_server_update_interval>1000</ntp_server_update_interval>    
                  <timezone_dhcp_option>0</timezone_dhcp_option>    
                  <selected_timezone>America/New_York</selected_timezone>
          1 Reply Last reply Reply Quote 1
          • guy038G
            guy038
            last edited by guy038

            Hi, @gamophyte and All,

            Ah… OK So, here is, below, a regex which just partially change the contents of each section as well as any tag-value pair outside a section !

            So, given theis new INPUT text, below :

              <time_date>    
                    <date_format>DD/MM/YY</date_format>    
                    <hr24_clock>1</hr24_clock>    
                    <ntp_dhcp_option>0</ntp_dhcp_option>    
                    <ntp_server>1</ntp_server>    
                    <ntp_server_addr>time1.google.com</ntp_server_addr>    
                    <ntp_server_update_interval>1000</ntp_server_update_interval>    
                    <timezone_dhcp_option>0</timezone_dhcp_option>    
                    <selected_timezone>America/New_York</selected_timezone>
              </time_date>
            
              <test_one>    
                    <date_format>DD/MM/YY</date_format>    
                    <hr24_clock>1</hr24_clock>    
                    <ntp_dhcp_option>0</ntp_dhcp_option>    
                    <ntp_server>1</ntp_server>    
                    <ntp_server_addr>time1.google.com</ntp_server_addr>    
                    <ntp_server_update_interval>1000</ntp_server_update_interval>    
                    <timezone_dhcp_option>0</timezone_dhcp_option>    
                    <selected_timezone>America/New_York</selected_timezone>
              </test_one>
            
                <date_format>DD/MM/YY</date_format>    
                <hr24_clock>1</hr24_clock>    
                <ntp_dhcp_option>0</ntp_dhcp_option>    
                <ntp_server>1</ntp_server>    
            
              <test_two>    
                    <date_format>DD/MM/YY</date_format>    
                    <hr24_clock>1</hr24_clock>    
                    <ntp_dhcp_option>0</ntp_dhcp_option>    
                    <ntp_server>1</ntp_server>    
                    <ntp_server_addr>time1.google.com</ntp_server_addr>    
                    <ntp_server_update_interval>1000</ntp_server_update_interval>    
                    <timezone_dhcp_option>0</timezone_dhcp_option>    
                    <selected_timezone>America/New_York</selected_timezone>
              </test_two>
            
                                <ntp_server_addr>time1.google.com</ntp_server_addr>    
                                <ntp_server_update_interval>1000</ntp_server_update_interval>    
                                <timezone_dhcp_option>0</timezone_dhcp_option>    
                                <selected_timezone>America/New_York</selected_timezone>
            

            Then, the folowing regex S/R :

            SEARCH (?-s)<(.+)>(.+)</\1>\h*

            REPLACE .$1 = $2

            Would leave you with this OUTPUT text :

              <time_date>    
                    .date_format = DD/MM/YY
                    .hr24_clock = 1
                    .ntp_dhcp_option = 0
                    .ntp_server = 1
                    .ntp_server_addr = time1.google.com
                    .ntp_server_update_interval = 1000
                    .timezone_dhcp_option = 0
                    .selected_timezone = America/New_York
              </time_date>
            
              <test_one>    
                    .date_format = DD/MM/YY
                    .hr24_clock = 1
                    .ntp_dhcp_option = 0
                    .ntp_server = 1
                    .ntp_server_addr = time1.google.com
                    .ntp_server_update_interval = 1000
                    .timezone_dhcp_option = 0
                    .selected_timezone = America/New_York
              </test_one>
            
                .date_format = DD/MM/YY
                .hr24_clock = 1
                .ntp_dhcp_option = 0
                .ntp_server = 1
            
              <test_two>    
                    .date_format = DD/MM/YY
                    .hr24_clock = 1
                    .ntp_dhcp_option = 0
                    .ntp_server = 1
                    .ntp_server_addr = time1.google.com
                    .ntp_server_update_interval = 1000
                    .timezone_dhcp_option = 0
                    .selected_timezone = America/New_York
              </test_two>
            
                                .ntp_server_addr = time1.google.com
                                .ntp_server_update_interval = 1000
                                .timezone_dhcp_option = 0
                                .selected_timezone = America/New_York
            

            Do you find this kind of regex more useful for you ?

            Of course, you’ll need to use a column-mode selection to add section names, right before the dot characters !

            BR

            guy038

            1 Reply Last reply Reply Quote 3
            • gamophyteG
              gamophyte
              last edited by

              This takes me far and beyond and away further than I was before, thanks!!

              Because even now you can use that front loaded dot as an anchor to do way more, like clean it up if no container (section) encapsulating it, and if there is I can S&R to put it on the front - even if it’s done by selecting at a time.

              You’ve taken it far enough, thanks for your work!!

              1 Reply Last reply Reply Quote 1
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors