• Login
Community
  • Login

Regex to replace XML settings file tags to INI-like config param and value

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
5 Posts 2 Posters 310 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G
    gamophyte
    last edited by gamophyte May 8, 2024, 6:10 AM May 8, 2024, 6:05 AM

    Hello everyone,

    Could you please help me the the following search-and-replace problem I am having?

    I am taking XML based configuration files, and converting sections of them to something that resembles an INI file data structure.

    I found an online converter but I would like to keep within my workspace of Notepadd++ as I select text and process as I go.

    Here is the data I currently have (“before” data):

      <time_date>    
            <date_format>DD/MM/YY</date_format>    
            <hr24_clock>1</hr24_clock>    
            <ntp_dhcp_option>0</ntp_dhcp_option>    
            <ntp_server>1</ntp_server>    
            <ntp_server_addr>time1.google.com</ntp_server_addr>    
            <ntp_server_update_interval>1000</ntp_server_update_interval>    
            <timezone_dhcp_option>0</timezone_dhcp_option>    
            <selected_timezone>America/New_York</selected_timezone>
      </time_date>
    

    Here is how I would like that data to look (“after” data):

    time_date.date_format = DD/MM/YY
    time_date.hr24_clock = 1
    time_date.ntp_dhcp_option = 0
    time_date.ntp_server = 1
    time_date.ntp_server_addr = time1.google.com
    time_date.ntp_server_update_interval = 1000
    time_date.timezone_dhcp_option = 0
    time_date.selected_timezone = America/New_York
    
    KEY POINTS:
    ● The XML section becomes front loaded on the param.  
    ● There must be a space before, and after, the equals symbol.
    

    Thank you!

    1 Reply Last reply Reply Quote 0
    • G
      guy038
      last edited by May 8, 2024, 8:19 AM

      Hello, @gamophyte and All,

      I suppose that the following regex S/R should just meet your goal !


      So, starting with the text, below, where I’ve, intentionally, added / deleted empty lines :

        <time_date>    
              <date_format>DD/MM/YY</date_format>    
              <hr24_clock>1</hr24_clock>    
      
              <ntp_dhcp_option>0</ntp_dhcp_option>    
              <ntp_server>1</ntp_server>    
              <ntp_server_addr>time1.google.com</ntp_server_addr>    
              <ntp_server_update_interval>1000</ntp_server_update_interval>    
              <timezone_dhcp_option>0</timezone_dhcp_option>    
              <selected_timezone>America/New_York</selected_timezone>
      
        </time_date>
      
      
      
        <test_one>    
              <date_format>DD/MM/YY</date_format>    
              <hr24_clock>1</hr24_clock>    
              <ntp_dhcp_option>0</ntp_dhcp_option>    
              <ntp_server>1</ntp_server>    
              <ntp_server_addr>time1.google.com</ntp_server_addr>    
              <ntp_server_update_interval>1000</ntp_server_update_interval>    
      
      
      
              <timezone_dhcp_option>0</timezone_dhcp_option>    
              <selected_timezone>America/New_York</selected_timezone>
        </test_one>
        <test_two>    
              <date_format>DD/MM/YY</date_format>    
              <hr24_clock>1</hr24_clock>    
              <ntp_dhcp_option>0</ntp_dhcp_option>    
              <ntp_server>1</ntp_server>    
              <ntp_server_addr>time1.google.com</ntp_server_addr>    
              <ntp_server_update_interval>1000</ntp_server_update_interval>    
              <timezone_dhcp_option>0</timezone_dhcp_option>    
              <selected_timezone>America/New_York</selected_timezone>
        </test_two>
      
      • Open your “source” file in N++, or, preferably, the text above in a new tab, for testing !

      • Open the Replace dialog ( Ctrl + H )

      • Unckeck all box options

      • Check the Wrap around option

      • Select the Regular expression seach mode

      SEARCH (?-is)^\h+<(.+)>(.+)</.+>\h*\R+(?=(?s:.+?)^\h+</(.+)>$)|^\h+<(/)?.+>\h*\R+

      REPLACE (?1$3.$1 = $2\r\n:(?4\r\n:)

      • Click once only the Replace All button

      => You should get your expected OUTPUT text :

      time_date.date_format = DD/MM/YY
      time_date.hr24_clock = 1
      time_date.ntp_dhcp_option = 0
      time_date.ntp_server = 1
      time_date.ntp_server_addr = time1.google.com
      time_date.ntp_server_update_interval = 1000
      time_date.timezone_dhcp_option = 0
      test_one.selected_timezone = America/New_York
      
      test_one.date_format = DD/MM/YY
      test_one.hr24_clock = 1
      test_one.ntp_dhcp_option = 0
      test_one.ntp_server = 1
      test_one.ntp_server_addr = time1.google.com
      test_one.ntp_server_update_interval = 1000
      test_one.timezone_dhcp_option = 0
      test_two.selected_timezone = America/New_York
      
      test_two.date_format = DD/MM/YY
      test_two.hr24_clock = 1
      test_two.ntp_dhcp_option = 0
      test_two.ntp_server = 1
      test_two.ntp_server_addr = time1.google.com
      test_two.ntp_server_update_interval = 1000
      test_two.timezone_dhcp_option = 0
      
      

      Notes :

      • Within the OUTPUT text, the first part before the dot is based on each corresponding </....> closing tag, within the INPUT text

      • Empty lines, between two items of a section, are simply ignored

      • Additional empty lines or missing empty line separator, between two sections, are normalized to a single empty line

      Best Regards,

      guy038

      G 1 Reply Last reply May 8, 2024, 1:12 PM Reply Quote 5
      • G
        gamophyte @guy038
        last edited by May 8, 2024, 1:12 PM

        @guy038 AMAZING!

        However I realize I can’t use this wholesale on the whole document. I’m finding that some settings aren’t within a section after all.

        In those cases, it appears to grab a value from a neighbor and uses that as the section name.

        It’s no issue just to select each section manually, and so I will just need another Regex for when the section is flat.

        Like just below in our running example:

                <date_format>DD/MM/YY</date_format>    
                <hr24_clock>1</hr24_clock>    
                <ntp_dhcp_option>0</ntp_dhcp_option>    
                <ntp_server>1</ntp_server>    
                <ntp_server_addr>time1.google.com</ntp_server_addr>    
                <ntp_server_update_interval>1000</ntp_server_update_interval>    
                <timezone_dhcp_option>0</timezone_dhcp_option>    
                <selected_timezone>America/New_York</selected_timezone>
        1 Reply Last reply Reply Quote 1
        • G
          guy038
          last edited by guy038 May 8, 2024, 5:07 PM May 8, 2024, 5:03 PM

          Hi, @gamophyte and All,

          Ah… OK So, here is, below, a regex which just partially change the contents of each section as well as any tag-value pair outside a section !

          So, given theis new INPUT text, below :

            <time_date>    
                  <date_format>DD/MM/YY</date_format>    
                  <hr24_clock>1</hr24_clock>    
                  <ntp_dhcp_option>0</ntp_dhcp_option>    
                  <ntp_server>1</ntp_server>    
                  <ntp_server_addr>time1.google.com</ntp_server_addr>    
                  <ntp_server_update_interval>1000</ntp_server_update_interval>    
                  <timezone_dhcp_option>0</timezone_dhcp_option>    
                  <selected_timezone>America/New_York</selected_timezone>
            </time_date>
          
            <test_one>    
                  <date_format>DD/MM/YY</date_format>    
                  <hr24_clock>1</hr24_clock>    
                  <ntp_dhcp_option>0</ntp_dhcp_option>    
                  <ntp_server>1</ntp_server>    
                  <ntp_server_addr>time1.google.com</ntp_server_addr>    
                  <ntp_server_update_interval>1000</ntp_server_update_interval>    
                  <timezone_dhcp_option>0</timezone_dhcp_option>    
                  <selected_timezone>America/New_York</selected_timezone>
            </test_one>
          
              <date_format>DD/MM/YY</date_format>    
              <hr24_clock>1</hr24_clock>    
              <ntp_dhcp_option>0</ntp_dhcp_option>    
              <ntp_server>1</ntp_server>    
          
            <test_two>    
                  <date_format>DD/MM/YY</date_format>    
                  <hr24_clock>1</hr24_clock>    
                  <ntp_dhcp_option>0</ntp_dhcp_option>    
                  <ntp_server>1</ntp_server>    
                  <ntp_server_addr>time1.google.com</ntp_server_addr>    
                  <ntp_server_update_interval>1000</ntp_server_update_interval>    
                  <timezone_dhcp_option>0</timezone_dhcp_option>    
                  <selected_timezone>America/New_York</selected_timezone>
            </test_two>
          
                              <ntp_server_addr>time1.google.com</ntp_server_addr>    
                              <ntp_server_update_interval>1000</ntp_server_update_interval>    
                              <timezone_dhcp_option>0</timezone_dhcp_option>    
                              <selected_timezone>America/New_York</selected_timezone>
          

          Then, the folowing regex S/R :

          SEARCH (?-s)<(.+)>(.+)</\1>\h*

          REPLACE .$1 = $2

          Would leave you with this OUTPUT text :

            <time_date>    
                  .date_format = DD/MM/YY
                  .hr24_clock = 1
                  .ntp_dhcp_option = 0
                  .ntp_server = 1
                  .ntp_server_addr = time1.google.com
                  .ntp_server_update_interval = 1000
                  .timezone_dhcp_option = 0
                  .selected_timezone = America/New_York
            </time_date>
          
            <test_one>    
                  .date_format = DD/MM/YY
                  .hr24_clock = 1
                  .ntp_dhcp_option = 0
                  .ntp_server = 1
                  .ntp_server_addr = time1.google.com
                  .ntp_server_update_interval = 1000
                  .timezone_dhcp_option = 0
                  .selected_timezone = America/New_York
            </test_one>
          
              .date_format = DD/MM/YY
              .hr24_clock = 1
              .ntp_dhcp_option = 0
              .ntp_server = 1
          
            <test_two>    
                  .date_format = DD/MM/YY
                  .hr24_clock = 1
                  .ntp_dhcp_option = 0
                  .ntp_server = 1
                  .ntp_server_addr = time1.google.com
                  .ntp_server_update_interval = 1000
                  .timezone_dhcp_option = 0
                  .selected_timezone = America/New_York
            </test_two>
          
                              .ntp_server_addr = time1.google.com
                              .ntp_server_update_interval = 1000
                              .timezone_dhcp_option = 0
                              .selected_timezone = America/New_York
          

          Do you find this kind of regex more useful for you ?

          Of course, you’ll need to use a column-mode selection to add section names, right before the dot characters !

          BR

          guy038

          1 Reply Last reply Reply Quote 3
          • G
            gamophyte
            last edited by May 8, 2024, 5:13 PM

            This takes me far and beyond and away further than I was before, thanks!!

            Because even now you can use that front loaded dot as an anchor to do way more, like clean it up if no container (section) encapsulating it, and if there is I can S&R to put it on the front - even if it’s done by selecting at a time.

            You’ve taken it far enough, thanks for your work!!

            1 Reply Last reply Reply Quote 1
            2 out of 5
            • First post
              2/5
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors