Looking for a more efficient Regex to merge lines from different parts of the doc into one
-
When faced with a problem like that, I usually begin by finding a way to sort the data so that the lines I need to merge are adjacent.
-
@Coises Hey, thank you for the tip! It’s funny I didn’t think about this, but that might be what I have to do given I have other examples of a line break
\n
“find” working with “replace all” perfectly.Maybe subconsciencly the reason that I chose not to is to allow a would-be viewer to easily check to see if everything came over to the new config style - having knowledge of where params sat - top to bottom - in the old style.
I will see if anyone has any other ideas, but I might have to just get over my wish for keeping params placed where they are.
-
@gamophyte said in Looking for a more efficient Regex to merge lines from different parts of the doc into one:
I will see if anyone has any other ideas, but I might have to just get over my wish for keeping params placed where they are.
I have been working on this and think I have a solution. But…
I’m trying to make sense of what you are trying to do. From your example (using same values for both red and blue doesn’t help) it seems that you want to update the word “red” to “purple”, but only if the equivalent “blue” word exists for the same number. And as for the blue lines, you just want to erase those, not saving any of their values. What also confused me was that your title says to “merge” two lines into one. That suggests the value of the blue line is transferred to the red line and is updated to say purple.I just need to know exactly what you are intending to do. When I was looking at a solution I used the following example lines:
red.1 = a setting value 1 red.2 = another setting value 1 red.3 = yet another setting value 1 red.4 = no matching value in blue blue.1 = a setting value 2 blue.2 = another setting value 2 blue.3 = yet another setting value 2
Could you provide the lines you see resulting from the “merge”.
Terry
-
@gamophyte said in Looking for a more efficient Regex to merge lines from different parts of the doc into one:
Maybe subconsciencly the reason that I chose not to is to allow a would-be viewer to easily check to see if everything came over to the new config style - having knowledge of where params sat - top to bottom - in the old style.
I will see if anyone has any other ideas, but I might have to just get over my wish for keeping params placed where they are.
Depending on how important this is, how often it will be used and how much time you have to debug it, you might be better off writing a script. Then you can make it do precisely what you want.
Within Notepad++, the most commonly used scripting language is Python. I don’t know that language, but if you do, see the Python Script plugin. There are definitely folks here who can help you with any problems you encounter.
-
@Coises Thanks again. I do have that plugin but I don’t know how to script at all. Regardless, I’ve learned notepad++ regex quickly, I can probably take a whack at it. I’m glad it’s something that’s on the table.
-
@Terry-R Thanks for having a look at this.
My simplification was on purpose in that I only wanted to highlight the fact when having multiple lines in the “find” being
(...)
stored, they can’t be used to keep the “replace all” going continuously.Likely the reason is the lines it hoovered up in the
(...)
storing, aren’t there to be “found” for the next process.My red blue purple is fiction, but illustrates the issue. The actual thing I’m doing is a kind of check; if a setting exists in the old config file, I need to add extra lines.
If you’re still curious… here the old config converted to new, up to this point of needing the multi-line “find”.
pfk.1.quick_dial = P111 pfk.2.quick_dial = 222 pfk.3.quick_dial = 333 pfk.4.quick_dial = 444 pfk.15.quick_dial = 15P15P15 pfk.16.quick_dial = 16P16P16 autodial_settings.cfg.autodial_1_IsPrefix=1 autodial_settings.cfg.autodial_2_IsPrefix=0 autodial_settings.cfg.autodial_3_IsPrefix=0 autodial_settings.cfg.autodial_4_IsPrefix=0 autodial_settings.cfg.autodial_15_IsPrefix=0 autodial_settings.cfg.autodial_16_IsPrefix=1
It goes from 1-16 but I compressed it for illustration. pfk.1.quick_dial is the new setting which already has been converted from old since up to this point (was a easy 1:1 swap).
However, if this older setting autodial_settings.cfg.autodial_1_IsPrefix=1 exists further down the config, that means pfk.1.quick_dial actually needs two extra settings lines…
pfk.1.prefix = P111 pfk.1.feature = prefix dial
Note the prefix will be the same value as the quick dial, and the quick dial still needs to exist. These two new lines being create can be anywhere in the file, so it’s fine that it appears where the old autodial_settings.cfg.autodial_1_IsPrefix=1 setting was.
Easy enough with what I already know…
Find:pfk.(\d\d?).quick_dial = (.*)((?s).*?)autodial_settings.cfg.autodial_\1_IsPrefix=1
Replace All:pfk.\1.quick_dial = \2\3\npfk.\1.feature = prefix dial\npfk.\1.prefix = \2
[Storing all the lines
(
(?s).*?)
to put them all back]So when done…
pfk.1.quick_dial = P111 pfk.2.quick_dial = 222 pfk.3.quick_dial = 333 pfk.4.quick_dial = 444 pfk.15.quick_dial = 15P15P15 pfk.16.quick_dial = 16P16P16 pfk.1.prefix = P111 pfk.1.feature = prefix dial autodial_settings.cfg.autodial_2_IsPrefix=0 autodial_settings.cfg.autodial_3_IsPrefix=0 autodial_settings.cfg.autodial_4_IsPrefix=0 autodial_settings.cfg.autodial_15_IsPrefix=0 autodial_settings.cfg.autodial_16_IsPrefix=1
Notice the autodial_settings.cfg.autodial_1_IsPrefix=1 is gone now. Then for the settings that are “0” meaning “disabled”, they can stay as there is a line cleaner that comes and scrubs the old setting namespaces.
But doing this way, as I said, I have to do another XML entry in my shortcuts.xml to process the next prefix enabled, which is quick dial 16. And I don’t know what each device will enable or not, so I have to do a 16 entries, wish I can just do one or two.
That’s what’s nice about “replace all” using
\r\n
(when it’s just the next line), it doesn’t have to store multiple lines to put them back, and all I need to do is “replace all” once. -
@gamophyte said in Looking for a more efficient Regex to merge lines from different parts of the doc into one:
My simplification was on purpose in that I only wanted to highlight the fact when having multiple lines in the “find” being (…) stored, they can’t be used to keep the “replace all” going continuously.
I made some assumptions based on your original example. Turns out you really wanted something quite different it seems (sometimes simplifying example data actually obfuscates the problem). Since I tried reading your latest example but gave up I will give you what I created based on your first example. If I put my mind to it I might later on revisit your “real” data example, but honestly I gave up because I felt deceived in a way. In this solution I’m in reality just updating the word “red” to “purple” and keeping the original value. But this solution (revised) would also work if the “blue” value was to be written over the original “red” value.
It uses a lookahead and alternation.
Find What:(?-s)^red\.(\d\d?) = (?=(?:.*\R)*^blue\.\1)|^(blue).*\R?
Replace With:(?2:purple.\1 = )
So in short, it looks for the current line to start with a “red” and have a following “blue” of the same number. If that doesn’t occur then it looks to see if the current line begins with a “blue”. So a “red” line that fits will have “red” changed to “purple” and if the current line is a “blue” then it will be erased.
See my 2 images, a before and an after one. This was done using the above regex with a “single” pass, which appears to be the real problem you were trying to solve. So I suppose in essence this is a teaching session to show you another possible way using a lookahead, instead of capturing all the in between lines, and thus denying the ability to complete all changes in a single pass. A lookahead can allow capturing in front of the caret and bringing that data back to the current line when writing it back. Maybe it hasn’t been something you were aware of?
Since you do seem to be reasonable proficient with regex I’ll leave my idea with you to massage to fit your “real” data.
Terry
-
@Terry-R said in Looking for a more efficient Regex to merge lines from different parts of the doc into one:
But this solution (revised) would also work if the “blue” value was to be written over the original “red” value.
As stated in the last email, a revised version of my solution would update the “red” value with the “blue” value, yet still allow for a single pass to complete. So the revised regex is:
Find What:(?-s)^red\.(\d\d?) = .*(?=(?:.*\R)*^blue\.\1 = (.*))|^(blue).*\R?
Replace With:(?3:purple.\1 = \2 )
See this image which is the updated version of my original “after”.
Terry
-
@Terry-R You’re amazing!
Sorry if I was keeping anything from you, it was the best example to illustrate my issue concisely. Even in the second post with the real application, it was the same issue yet added way too much information.
Regardless you found my weakness here, I didn’t know about using a lookahead. I take your compliment about my regex knowledge so far, but I’ve only hobbled along using cheat sheet examples until I started barely seeing full expressions in my head. Thank you again!
I see your revision too. I will be applying this later in the night.
-
2 good websites I’ve used during my learning process (yes I too was once where you are now) are:
www.regex101.com look at this FAQ post for details on the regular expression engine used in Notepad++
and
rexegg.com which has a lot of useful information in a neat concise format.Good luck
Terry -
@Terry-R Excellent!! Thank you!