Deleting a group of characters in lines with same beginning and ending, but different in between (re-post)

Polar Bear

Fellow Notepad++ Users,
Could you please help me the the following search-and-replace problem I am having?
I want to delete the [/b] in lines with the same beginning and ending, but different characters in between, like these:

Here is the data I currently have (“before” data):

[m3][c #0A5D00]▸[i] You’d get up early[/i][/b][/c][/m3]
[m3][c #0A5D00]▸[i] We prefer cheese[/i][/b][/c][/m3]
[m3][c #0A5D55]▸[b][i] charity fund[/i][/b][/c][/m3]
[m3][c #0A5D00]▸[i] They never came[/i][/b][/c][/m3]
[m3][c #0A5D55]▸[b][i] board of charity[/i][/b][/c][/m3]

Here is how I would like that data to look (“after” data):

[m3][c #0A5D00]▸[i] You’d get up early[/i][/c][/m3]
[m3][c #0A5D00]▸[i] We prefer cheese[/i][/c][/m3]
[m3][c #0A5D55]▸[b][i] charity fund[/i][/b][/c][/m3]
[m3][c #0A5D00]▸[i] They never came[/i][/c][/m3]
[m3][c #0A5D55]▸[b][i] board of charity[/i][/b][/c][/m3]

To accomplish this, I have tried using the following Find/Replace expressions and settings
• Find What = [m3][c #0A5D00]▸[i]*[/i][/b][/c][/m3]
• Replace With = [m3][c #0A5D00]▸[i]*[/i][/c][/m3]
• Search Mode = all the three, one after another (REGULAR EXPRESSION, then NORMAL, then EXTENDED)
• Dot Matches Newline = NOT CHECKED
I tried the Find What function first, but it didn’t work, and I’m not sure why.
Could you please help me understand what went wrong and help me find the solution?
Thank you.

Neil Schipper

Hi. You were sort of getting there, but you’re missing a few techniques:

\Q…\E to force special characters (like square braces) to be treated as literal
* is not the simple wild card you may be used to
\K to throw away everything matched so far

This should do it: \Q[m3][c #0A5D00]▸[i]\E.*?\K\Q[/b]\E

Neil Schipper

@Neil-Schipper And, since the match is only on what we want removed, we keep “Replace with” empty.

guy038

Hello, @polar-bear, @neil-schipper and All,

This simple regex should work, too :

SEARCH (?-si)^(.+#0A5D00.+)\\[/b\\]
REPLACE $1

Tick preferably the Wrap around option

Select the Regular expression search mode

Click, either, once on the Replace All button or several times on the Replace one

Notes :

The modifiers (?-is) assure that the . matches newline option is not checked and that the Match case option is checked
Then, after the beginning of line ( ^ ) the part .+#0A5D00.+ matches all standard characters… till the string #0A5D00, included, and then an other non-null range of standard characters till…
…The literal string [/b] ( Note that the square brackets [ and ], have a special signification in regexes. So, they must be escaped in order to search these characters literally )
As the part .+#0A5D00.+ is embedded in parentheses, it is stored as group 1 and can be re-used, in the replacement regex, with the $1 or \1 syntax. So, the part \\[/b\\], alone, is not rewritten !

Best Regards,

guy038

Neil Schipper

@Neil-Schipper

aaaaaaaaand I forgot to say it’s a regex

aaaaaaaaand I forgot to say: you may click Find to satisfy yourself it’s matching the text to remove; when you want to apply the changes to the whole file, use Replace All. This is a class of regex for which the a single Replace operation does not work, for reasons I don’t understand.

Neil Schipper

@polar-bear Note that @guy038’s solution, which does not rely on \K, allows you to do single Replace operations in case you had a need to interactively check each instance before replacing.

It’s also very tolerant of unspecified text both before and after #0A5D00, while mine uses more rigid constraints.

guy038

Hi, @polar-bear, @neil-schipper and All,

Neil, you’re right about the general behavior of my regex. Yours is more robust, of course.

However the example provided by the OP is really minimalist : we don’t about about possible other #0A.... strings, different from #0A5D00 and #0A5D55. We don’t know about the context of these lines and so on…

So, I just rely on the changes of the #A0.... part !

Now, if @polar-bear want to search, for example, for the strings #0A5D00, #0A5C00, #0A5E50 and #0A5FFF, simultaneously, my regex would become :

SEARCH (?-si)^(.+#0A(?:5D00|5C00|5E50|5FFF).+)\\[/b\\]

REPLACE $1

BR

guy038

P.S.:

Oh, I just saw, in the OP’s example, that the suppression of [/b] must occur only in lines which do not have the string [b], before the random text. So, may be his challenge could be expressed, in fluent language, as :

“How to delete the [/b] string in any line which does not contain a [b] string before” ???

Wait and see !

Neil Schipper

@guy038 said in Deleting a group of characters in lines with same beginning and ending, but different in between (re-post):

“How to delete the [/b] string in any line which does not contain a [b] string before”

I am just realizing, late in the game, his spec could have been very concisely stated, “delete [/b] from all uncharitable lines”! (You may not give up until you find the pun.)

@polar-bear Note that the two solutions presented to you differ in another interesting way:

#1 satisfies a “before-text” precondition, then matches only the “to-remove-text” which is replaced with nil; it “takes away”

#2 both “before-text” and “to-remove-text” are matched but the “before-text” is stored in its own named basket, and that is what replaces the total match; it “replaces a whole with a part”.

This gives you an idea of the power and flexibility of this rather unpretty programming language.

Polar Bear

I’ve been able to get the job done, using the suggestions by guy038, which seems the simplest for me.
(
SEARCH (?-si)^(.+#0A5D00.+)[/b]

REPLACE $1

… )

Anyhow, thank you both for taking the trouble to help.
With best wishes

Polar Bear

This post is deleted!