Community
    • Login

    Looking for a more efficient Regex to merge lines from different parts of the doc into one

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    12 Posts 3 Posters 357 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • gamophyteG
      gamophyte
      last edited by gamophyte

      Hello, I am working on building a macro to convert an old device config file, to a new device config file, in which I have to merge two lines into one.

      To do so, I have my shortcut.xml full of “find and replace all” entries with regular expressions involving (...) storing .

      However using the only method I know to use, I cannot “replace all” in document because it leaps over data via multiline find .*)((?s).*?, so I’m hoping for another method.

      Example Data in the old config:
      red.1 = a setting value
      red.2 = another setting value
      red.3 = yet another setting value
      
      blue.1 = a setting value
      blue.2 = another setting value
      blue.3 = yet another setting value
      

      Note: These can go up to number 16, so I will use (\d\d?) to store possible second-place digits.

      “Merged” results matching params with numbers:
      purple.1 = a setting value
      purple.2 = another setting value
      purple.3 = yet another setting value
      

      Note: I’m focusing away from the value of the blue params, to simplify the example. But they will be a part of the find as to eliminate both red and blue, in favor of purple.

      What I know you can do for this:

      Find:
      red.(\d\d?) = (.*)((?s).*?)blue.\1 = .*
      Replace:
      purple.\1 = \2\3

      It works, grabbing the red param and value, leaping over multiple lines, and then grabbing blue of the same number, and then I output all the multiple lines I “leapt” over back with \3 (so they’re not destroyed). But because of that, a single “replace all” won’t convert all lines.

      If you were doing this manually you would have to hit “replace all” three times to get it to work for all entries. But me, putting this in the shortcuts.xml would then have to do that entry 3 times. It gets worse when there is 16 entries!

      Up until this point “replace all” has worked perfectly for multiple lines, no issue, but because I’m grabbing and storing so many lines at once, it can’t work with “replace all”.

      Do you know of another method or regex to avoid this issue?

      My alternative so far is to just convert red into purple up front, using only one “find and replace all” entry. But then I will eventually need blue’s value. I again already have an alternative for that, but if at the start I can just merge with one “find and replace all” entry, I’d be happy. Thank you!

      CoisesC 1 Reply Last reply Reply Quote 1
      • CoisesC
        Coises @gamophyte
        last edited by

        @gamophyte

        When faced with a problem like that, I usually begin by finding a way to sort the data so that the lines I need to merge are adjacent.

        gamophyteG 1 Reply Last reply Reply Quote 2
        • gamophyteG
          gamophyte @Coises
          last edited by

          @Coises Hey, thank you for the tip! It’s funny I didn’t think about this, but that might be what I have to do given I have other examples of a line break \n “find” working with “replace all” perfectly.

          Maybe subconsciencly the reason that I chose not to is to allow a would-be viewer to easily check to see if everything came over to the new config style - having knowledge of where params sat - top to bottom - in the old style.

          I will see if anyone has any other ideas, but I might have to just get over my wish for keeping params placed where they are.

          Terry RT CoisesC 2 Replies Last reply Reply Quote 1
          • Terry RT
            Terry R @gamophyte
            last edited by

            @gamophyte said in Looking for a more efficient Regex to merge lines from different parts of the doc into one:

            I will see if anyone has any other ideas, but I might have to just get over my wish for keeping params placed where they are.

            I have been working on this and think I have a solution. But…
            I’m trying to make sense of what you are trying to do. From your example (using same values for both red and blue doesn’t help) it seems that you want to update the word “red” to “purple”, but only if the equivalent “blue” word exists for the same number. And as for the blue lines, you just want to erase those, not saving any of their values. What also confused me was that your title says to “merge” two lines into one. That suggests the value of the blue line is transferred to the red line and is updated to say purple.

            I just need to know exactly what you are intending to do. When I was looking at a solution I used the following example lines:

            red.1 = a setting value 1
            red.2 = another setting value 1
            red.3 = yet another setting value 1
            red.4 = no matching value in blue
            
            blue.1 = a setting value 2
            blue.2 = another setting value 2
            blue.3 = yet another setting value 2
            

            Could you provide the lines you see resulting from the “merge”.

            Terry

            gamophyteG 1 Reply Last reply Reply Quote 0
            • CoisesC
              Coises @gamophyte
              last edited by

              @gamophyte said in Looking for a more efficient Regex to merge lines from different parts of the doc into one:

              Maybe subconsciencly the reason that I chose not to is to allow a would-be viewer to easily check to see if everything came over to the new config style - having knowledge of where params sat - top to bottom - in the old style.

              I will see if anyone has any other ideas, but I might have to just get over my wish for keeping params placed where they are.

              Depending on how important this is, how often it will be used and how much time you have to debug it, you might be better off writing a script. Then you can make it do precisely what you want.

              Within Notepad++, the most commonly used scripting language is Python. I don’t know that language, but if you do, see the Python Script plugin. There are definitely folks here who can help you with any problems you encounter.

              gamophyteG 1 Reply Last reply Reply Quote 1
              • gamophyteG
                gamophyte @Coises
                last edited by

                @Coises Thanks again. I do have that plugin but I don’t know how to script at all. Regardless, I’ve learned notepad++ regex quickly, I can probably take a whack at it. I’m glad it’s something that’s on the table.

                1 Reply Last reply Reply Quote 0
                • gamophyteG
                  gamophyte @Terry R
                  last edited by gamophyte

                  @Terry-R Thanks for having a look at this.

                  My simplification was on purpose in that I only wanted to highlight the fact when having multiple lines in the “find” being (...) stored, they can’t be used to keep the “replace all” going continuously.

                  Likely the reason is the lines it hoovered up in the (...) storing, aren’t there to be “found” for the next process.

                  My red blue purple is fiction, but illustrates the issue. The actual thing I’m doing is a kind of check; if a setting exists in the old config file, I need to add extra lines.

                  If you’re still curious… here the old config converted to new, up to this point of needing the multi-line “find”.

                  pfk.1.quick_dial = P111
                  pfk.2.quick_dial = 222
                  pfk.3.quick_dial = 333
                  pfk.4.quick_dial = 444
                  pfk.15.quick_dial = 15P15P15
                  pfk.16.quick_dial = 16P16P16
                  
                  autodial_settings.cfg.autodial_1_IsPrefix=1
                  autodial_settings.cfg.autodial_2_IsPrefix=0
                  autodial_settings.cfg.autodial_3_IsPrefix=0
                  autodial_settings.cfg.autodial_4_IsPrefix=0
                  autodial_settings.cfg.autodial_15_IsPrefix=0
                  autodial_settings.cfg.autodial_16_IsPrefix=1
                  

                  It goes from 1-16 but I compressed it for illustration. pfk.1.quick_dial is the new setting which already has been converted from old since up to this point (was a easy 1:1 swap).

                  However, if this older setting autodial_settings.cfg.autodial_1_IsPrefix=1 exists further down the config, that means pfk.1.quick_dial actually needs two extra settings lines…

                  pfk.1.prefix = P111
                  pfk.1.feature = prefix dial
                  

                  Note the prefix will be the same value as the quick dial, and the quick dial still needs to exist. These two new lines being create can be anywhere in the file, so it’s fine that it appears where the old autodial_settings.cfg.autodial_1_IsPrefix=1 setting was.

                  Easy enough with what I already know…
                  Find: pfk.(\d\d?).quick_dial = (.*)((?s).*?)autodial_settings.cfg.autodial_\1_IsPrefix=1
                  Replace All: pfk.\1.quick_dial = \2\3\npfk.\1.feature = prefix dial\npfk.\1.prefix = \2

                      [Storing all the lines ((?s).*?) to put them all back]

                  So when done…

                  pfk.1.quick_dial = P111
                  pfk.2.quick_dial = 222
                  pfk.3.quick_dial = 333
                  pfk.4.quick_dial = 444
                  pfk.15.quick_dial = 15P15P15
                  pfk.16.quick_dial = 16P16P16
                  
                  pfk.1.prefix = P111
                  pfk.1.feature = prefix dial
                  
                  autodial_settings.cfg.autodial_2_IsPrefix=0
                  autodial_settings.cfg.autodial_3_IsPrefix=0
                  autodial_settings.cfg.autodial_4_IsPrefix=0
                  autodial_settings.cfg.autodial_15_IsPrefix=0
                  autodial_settings.cfg.autodial_16_IsPrefix=1
                  

                  Notice the autodial_settings.cfg.autodial_1_IsPrefix=1 is gone now. Then for the settings that are “0” meaning “disabled”, they can stay as there is a line cleaner that comes and scrubs the old setting namespaces.

                  But doing this way, as I said, I have to do another XML entry in my shortcuts.xml to process the next prefix enabled, which is quick dial 16. And I don’t know what each device will enable or not, so I have to do a 16 entries, wish I can just do one or two.

                  That’s what’s nice about “replace all” using \r\n (when it’s just the next line), it doesn’t have to store multiple lines to put them back, and all I need to do is “replace all” once.

                  Terry RT 1 Reply Last reply Reply Quote 1
                  • Terry RT
                    Terry R @gamophyte
                    last edited by Terry R

                    @gamophyte said in Looking for a more efficient Regex to merge lines from different parts of the doc into one:

                    My simplification was on purpose in that I only wanted to highlight the fact when having multiple lines in the “find” being (…) stored, they can’t be used to keep the “replace all” going continuously.

                    I made some assumptions based on your original example. Turns out you really wanted something quite different it seems (sometimes simplifying example data actually obfuscates the problem). Since I tried reading your latest example but gave up I will give you what I created based on your first example. If I put my mind to it I might later on revisit your “real” data example, but honestly I gave up because I felt deceived in a way. In this solution I’m in reality just updating the word “red” to “purple” and keeping the original value. But this solution (revised) would also work if the “blue” value was to be written over the original “red” value.

                    It uses a lookahead and alternation.
                    Find What:(?-s)^red\.(\d\d?) = (?=(?:.*\R)*^blue\.\1)|^(blue).*\R?
                    Replace With:(?2:purple.\1 = )

                    So in short, it looks for the current line to start with a “red” and have a following “blue” of the same number. If that doesn’t occur then it looks to see if the current line begins with a “blue”. So a “red” line that fits will have “red” changed to “purple” and if the current line is a “blue” then it will be erased.

                    See my 2 images, a before and an after one. This was done using the above regex with a “single” pass, which appears to be the real problem you were trying to solve. So I suppose in essence this is a teaching session to show you another possible way using a lookahead, instead of capturing all the in between lines, and thus denying the ability to complete all changes in a single pass. A lookahead can allow capturing in front of the caret and bringing that data back to the current line when writing it back. Maybe it hasn’t been something you were aware of?

                    Since you do seem to be reasonable proficient with regex I’ll leave my idea with you to massage to fit your “real” data.

                    Terry
                    replace1.jpg

                    replace2.jpg

                    Terry RT 1 Reply Last reply Reply Quote 1
                    • Terry RT
                      Terry R @Terry R
                      last edited by

                      @Terry-R said in Looking for a more efficient Regex to merge lines from different parts of the doc into one:

                      But this solution (revised) would also work if the “blue” value was to be written over the original “red” value.

                      As stated in the last email, a revised version of my solution would update the “red” value with the “blue” value, yet still allow for a single pass to complete. So the revised regex is:
                      Find What:(?-s)^red\.(\d\d?) = .*(?=(?:.*\R)*^blue\.\1 = (.*))|^(blue).*\R?
                      Replace With:(?3:purple.\1 = \2 )

                      See this image which is the updated version of my original “after”.

                      replace3.jpg

                      Terry

                      gamophyteG 1 Reply Last reply Reply Quote 1
                      • gamophyteG
                        gamophyte @Terry R
                        last edited by gamophyte

                        @Terry-R You’re amazing!

                        Sorry if I was keeping anything from you, it was the best example to illustrate my issue concisely. Even in the second post with the real application, it was the same issue yet added way too much information.

                        Regardless you found my weakness here, I didn’t know about using a lookahead. I take your compliment about my regex knowledge so far, but I’ve only hobbled along using cheat sheet examples until I started barely seeing full expressions in my head. Thank you again!


                        I see your revision too. I will be applying this later in the night.

                        Terry RT 1 Reply Last reply Reply Quote 1
                        • Terry RT
                          Terry R @gamophyte
                          last edited by Terry R

                          @gamophyte

                          2 good websites I’ve used during my learning process (yes I too was once where you are now) are:
                          www.regex101.com look at this FAQ post for details on the regular expression engine used in Notepad++
                          and
                          rexegg.com which has a lot of useful information in a neat concise format.

                          Good luck
                          Terry

                          gamophyteG 1 Reply Last reply Reply Quote 2
                          • gamophyteG
                            gamophyte @Terry R
                            last edited by

                            @Terry-R Excellent!! Thank you!

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors