Regex to replace ALL characters in captured group of a search string with the same character when on one or two lines
-
Wanting regex search/replace to do the following:
(search string is in one line with a single space between the “:” and “<”)
Message-ID: <S3HC1075.C9B9D2W4@...> Message-ID: <F2CA98A6AWK-LDLL2WQVUFZWSA-8N4-O2WAX-DHGJ2@x> Message-ID: <20230112021536.D44C04F585AAECBA@...> Message-ID: <CAAGbbBVLx-B8sc2ifvzrHEUHTYEvSxkK5=OcDWxk=UKM3eMo2g@...> Message-ID: <347545688.95523.1695404577920.JavySail.boot@...>
(desired output)
Message-ID: <S3HC_________D2W4@...> Message-ID: <F2CA__________________________________HGJ2@x> Message-ID: <2023_______________________ECBA@...> Message-ID: <CAAG___________________________________________Mo2g@...> Message-ID: <3475___________________________________boot@...>
(search string is in two consecutive lines with no space after the “:” on the 1st line and a space at the beginning of the 2nd line in front of the “<”)
Message-ID: <B7B207D39A1D4104860D8073A027CCD51DACD65B146E@...> Message-ID: <286f14f5-bb9d-4c26-ae24-2d4ae7484bb3@...>
(desired output)
Message-ID: <B7B2____________________________________146E@...> Message-ID: <286f____________________________4bb3@...>
I have tried numerous variations of regex search strings without success, such as this for the string in the same line:
(Message-ID:[ ])(<[A-z0-9]{4})(.*?)(.{4})@
This regex search ALSO finds the string okay when it is in one line but it skips over the string when it is in two lines:
(Message-ID:[ \r\n])(<[A-z0-9]{4})(.*?)(.{4})@
1- What needs to be done to capture both scenarios in a single search?
I have tried numerous attempts to get the replace to work, but have also been unsuccessful in that. The following replaces groups 1, 2, and 4 correctly for the search above, but doesn’t replace all characters in group 3 with the underscore “" character. It just produces a single "” and then the input characters from group 3:
(replace with)
\1\2(_{\3})\4@
2- What needs to be done to get all characters in group 3 to be changed to “_”?
-
@asapReps said in Regex to replace ALL characters in captured group of a search string with the same character when on one or two lines:
This regex search ALSO finds the string okay when it is in one line but it skips over the string when it is in two lines:
(Message-ID:[ \r\n])(<[A-z0-9]{4})(.*?)(.{4})@
1- What needs to be done to capture both scenarios in a single search?
This part is easy, so let’s do it first. By itself,
[ \r\n]
matches a single character: either a blank, a line return or a new line. What you wanted was probably:
(Message-ID:[ \r\n]*)(<[A-z0-9]{4})(.*?)(.{4})@
which means the colon can be followed by any number of spaces, line returns and/or new lines. If you wanted to say that there must be either no line break or exactly one line break, followed by exactly one space, then use:
(Message-ID:\R? )(<[A-z0-9]{4})(.*?)(.{4})@
instead.The second part is harder, because there is no simple way to identify a sub-string in a regular expression and then replace every individual character in it with some other character. If I can work out a solution, I’ll propose it in another comment.
-
@asapReps said in Regex to replace ALL characters in captured group of a search string with the same character when on one or two lines:
1- What needs to be done to capture both scenarios in a single search?
I have tried numerous attempts to get the replace to work, but have also been unsuccessful in that. The following replaces groups 1, 2, and 4 correctly for the search above, but doesn’t replace all characters in group 3 with the underscore “" character. It just produces a single "” and then the input characters from group 3:
(replace with)
\1\2({\3})\4@
2- What needs to be done to get all characters in group 3 to be changed to “”?Very interesting problem you had. Unfortunately you haven’t understood the replacement rules for regex. You cannot do (_{\3}) and expect it to produce a varying amount of underscore characters based on the find string. It doesn’t work like that. However all is not lost, I have created a solution which I hope will work for you, and also that you can understand in case you need to adjust to suit requirements.
Firstly though, your first question about a single regex to find the strings in both scenarios. Here it is:
(?-s)(Message-ID:\R? )(<[A-z0-9]{4})((.{32})?(.{16})?(.{8})?(.{4})?(..)?(.)?)(.{4})@
Now to the major problem, how to replace with a varying amount of underscore characters based on the number of characters found before the replacement occurs. What my find solution does is to attempt to find the longest length first, then the next longest and so on, using the numbers 32,16,8,4,2,1. This gives a maximum length of 63 characters plus the additional 4 at start and end of string, so 71 in total. If your string is possibly going to be longer insert(.{64})?
before the(.{32})?
to increase maximum string length to 135 characters. I’m hoping by now you get the concept I’m using. In the replacement regex we use(?4
as a question. So if group 4 in the find regex was filled with a string, then the replacement regex says replace with said number of underscores characters, but we must actually create all of those characters.
${1}${2}(?4________________________________)(?5________________)(?6________)(?7____)(?8__)(?9_)${10}@
This is probably a lot to take in, so take your time and I’d suggest using the regex101.com website and punch in my find regex. Unfortunately it doesn’t understand the replacement regex but it will work in Notepad. It probably has to do with the regular expression engine not being EXACTLY one of the available flavor’s on the website.
Terry
-
@asapReps said in Regex to replace ALL characters in captured group of a search string with the same character when on one or two lines:
2- What needs to be done to get all characters in group 3 to be changed to “_”?
The solution I will suggest uses the Columns++ plugin. If you have a recent version of Notepad++ you can install it using Plugins | Plugins Admin….
Though that plugin is mostly designed to work with data organized in columns, its Search dialog has some capabilities that will be useful here.
After installing that plugin, with your file open and nothing selected, open the Search… dialog from the Columns++ menu.
Select Search Mode: Regular expression and paste this:
Message-ID:\R? <[A-z0-9]{4}\K.*?(?=.{4}@)
into the Find box.This is different from the earlier expression in two important ways. First, the \K causes the selection or replacement to begin from where the \K is encountered, instead of from the beginning. That takes us up to the .* part that’s to be replaced. Following that, we use a look-ahead; enclosing the trailing part of the expression in (?=…) causes that part to be required, but not included in the part to be selected or replaced.
Now, for the first Columns++ trick. Instead of using Find or Replace, click the downward-pointing arrow at the right of the Count button, and then, from the menu that appears, click Select All.
Now you have a multiple selection that includes just the part to be replaced.
The Search function in Columns++ works in regions. In the dialog there is a button labeled Set. Click that button to make the searchable region match the selection.
Now, put a single period in the Find box, a single underscore in the Replace box, and click Replace All. That will replace each character in the indicated regions with an underscore.
-
@Coises Thanks for the search solution. It worked fine.
-
@Terry-R Thanks for the response. The solutions work great for all the cases I have tried. I would NEVER have come up with that type of solution. Appreciate the expertise.
-
Hello, @asapRepsnp, @coises, @terry-r and All,
I think that an elegant solution would be to use the generic regex, exposed in this post :
Note that, to take in account the two types of your INPUT text, I used this leading regex part :
(?-is:^Message-ID:(?:|\R)\x20<.{4}
With the non-capturing alternative
(?:|\R)
which means : nothing or a line-break, between the colon char afterMessage-ID
and the space char before the<
character !
So :
-
Open your file in Notpead++
-
Move the caret ( cursor ) :
-
At the very beginning of your file (
Ctrl + Home
) -
On a line, or the first line, beginning with
Message-ID
-
-
Open the Replace dialog (
Ctrl + H
) -
Uncheck the
Wrap around
box option -
Then, with this complete regex S/R, below :
-
SEARCH
(?-is:^Message-ID:(?:|\R)\x20<.{4}|(?!\A)\G).*?\K.(?=.*.{4}@)
-
REPLACE
_
-
The INPUT example text, below :
Message-ID: <S3HC1075.C9B9D2W4@...> Message-ID: <F2CA98A6AWK-LDLL2WQVUFZWSA-8N4-O2WAX-DHGJ2@x> Message-ID: <20230112021536.D44C04F585AAECBA@...> Message-ID: <CAAGbbBVLx-B8sc2ifvzrHEUHTYEvSxkK5=OcDWxk=UKM3eMo2g@...> Message-ID: <347545688.95523.1695404577920.JavySail.boot@...> Message-ID: <S3HC1075.C9B9D2W4@...> Message-ID: <F2CA98A6AWK-LDLL2WQVUFZWSA-8N4-O2WAX-DHGJ2@x> Message-ID: <20230112021536.D44C04F585AAECBA@...> Message-ID: <CAAGbbBVLx-B8sc2ifvzrHEUHTYEvSxkK5=OcDWxk=UKM3eMo2g@...> Message-ID: <347545688.95523.1695404577920.JavySail.boot@...> Message-ID: <B7B207D39A1D4104860D8073A027CCD51DACD65B146E@...> Message-ID: <286f14f5-bb9d-4c26-ae24-2d4ae7484bb3@...>
Is automatically changed as the following OUTPUT text :
Message-ID: <S3HC_________D2W4@...> Message-ID: <F2CA__________________________________HGJ2@x> Message-ID: <2023_______________________ECBA@...> Message-ID: <CAAG___________________________________________Mo2g@...> Message-ID: <3475___________________________________boot@...> Message-ID: <S3HC_________D2W4@...> Message-ID: <F2CA__________________________________HGJ2@x> Message-ID: <2023_______________________ECBA@...> Message-ID: <CAAG___________________________________________Mo2g@...> Message-ID: <3475___________________________________boot@...> Message-ID: <B7B2____________________________________146E@...> Message-ID: <286f____________________________4bb3@...>
Using the free-spacing mode, with the leading
(?x)
modifier, your search regex would become :SEARCH
(?x) (?-is: ^ Message-ID : (?: | \R) \x20 < .{4} | (?! \A) \G) .*? \K . (?= .* .{4} @)
For identical replacements, of course !
Best Regards,
guy038
-