Unable to use Replace in files

dr ramaanand

I tried to Find (after skipping some strings) this in another open file and got the same error (Invalid Regular expression). However, if I copy the block of text typed right on top, in my first query/post in this thread into that open file, I am able to find the same

Alan Kilborn

@dr-ramaanand

From the error message, either your regex is invalid, or the regex operating on your dataset causes the regex operation to become too computationally intensive.

PeterJones

@dr-ramaanand said in Unable to use Replace in files:

Invalid Regulat Expression

If you hover over the [...] in the error message, does it say that the expression is too complex, or similar? Because with so many fancy constructs, your regex has to backtrack a lot, and it’s probably too much given the size of your file. If so, the regex would need to be made less “open”, or otherwise less complex, to work

As you have ignored dozens of times, regex is the wrong tool for editing XML/HTML

dr ramaanand

@PeterJones Can I add something to stop searching once it finds </html>? If so, what?
On clicking the icon you typed above, I get this message: The complexity of matching the regular expression exceeded predefined bounds. Try refactoring the regular expression to make each choice made by the state machine unambiguous. This exception is thrown to prevent eternal matches that take an indefinite period time to locate. Using this in the Replace in Files field with the Regular expression mode ticked/selected did not change anything: ((?:<p[^>]*?color: black.*?>[\S\s\n]*?<\/p>\s*<span[^>]*>)|(?:\b<span[^>]*?color: black.*?>[\S\s\n]*?<code[^>]*>\b))(*SKIP)(*F)|\b<code\s*style="background-color:\s*transparent;">\b

PeterJones

@dr-ramaanand said in Unable to use Replace in files:

If so, what?

No clue. Once a regex involves (*SKIP)(*F) , it’s way beyond my regex skill level, and anything that came close to approaching “needing” such constructs, I could always do more quickly by writing a perl program which would do the manipulation, using an XML library and maybe, if still necessary, involving simple Perl regex on just the contents of the tags, never on the tags themselves. I would never waste my time trying to tweak all these fragile regex to manipulate XML/HTML in as complicated a manner as you are, and I won’t waste my time or yours trying to help you do such a foolish thing.

As I’ve said before, and will say again, you are using the wrong tool for the job. There is only so much a screwdriver can do when what you really need is a jig saw.

dr ramaanand

@PeterJones The Regular expression ((?><p[^>]*?color: black.*?>[\S\s\n]*?<\/p>\s*<span[^>]*>)*)<code\s*style="background-color:\s*transparent;"[^>]*> helps skip the <p.......color: black...>(any white spaces)<span.......> that precedes <code\s*style="background-color:\s*transparent;"[^>]*>. Now how do I tweak it to also skip matching the <span.......color: black...> that precedes <code\s*style="background-color:\s*transparent;"[^>]*>?

PeterJones

@dr-ramaanand ,

You don’t seem to understand. As I said an hour ago, “I won’t waste my time or yours trying to help you do such a foolish thing.”

If someone else wants to waste their time to help you do it the wrong way, that’s there choice. But the only help you will get from me on this is to be reminded that you are using the wrong tool for the job, and that is the reason that things are not working as you hope. I am sorry that you cannot seem to understand the message that I am trying to convey.

dr ramaanand

@PeterJones Notepad++ is the only tool I know that helps replace in (multiple) files which is why I asked this question here (in this community)

PeterJones

@dr-ramaanand ,

Notepad++ is the only tool I know that helps replace in (multiple) files which is why I asked this question here (in this community)

You need to find a content management tool to generate your HTML from some other source (like a database or similar), where it separates the content (the real text) from the structure (the HTML markup); such tools make editing the content separately from the structure a simple task; but we cannot tell you what content management tool to use, as this forum is about Notepad++, not about the esoterica of a specific instance of data manipulation.

Notepad++ is not the only tool that uses Boost regex; any discussion forum that specializes in Boost regex, or any generic regex forum that includes Boost regex as part of its repertoire would be able to answer your regex questions. Almost none of your questions have involved the specifics of how Notepad++ access or interfaces with the Boost regex engine, or the particular settings that Notepad++ uses; none of your questions deal with the specifics of how Notepad++ interacts between the find-in-file mechanism and the Boost regex engine. If any of your questions were specific to Notepad++, I would be all for you asking them here. But they don’t. Your questions are Notepad++ agnostic, in that any tool that uses the Boost regex engine would require the same answer.

For brand new users, or people who have only asked one or two search/replace questions in the forum, I’m willing to extend them the courtesy to help them with the first few questions, and point them to regex resources to help them learn in the future. But you’ve extended way beyond that, and gone beyond my “grace period” for regex questions. This forum is not meant to be a “full course on regex” or “be my regex tutor” or “answer all my regex questions” or “write all my regex for me”; but the vast majority of your questions and posts in this forum indicate that is how you seem to use this forum.

On this specific question, you actually had one part of your question that was totally on-topic for the forum: “why is Notepad++ telling me this regex is invalid”; and I happily answered that part.

At this point, I am going back to not replying to your posts, as I’ve made my point clear. If someone else disagrees with me, and feels that you have not extended beyond being “on topic” for this forum, and is still interested in providing you free regex-writing services yet again, they are free to do so.

guy038

Hello, @dr-ramaanand, @peterjones, @alan-kilborn and All,

First, @dr-ramaanand, I don’t understand why you inist to use very complicated regexes, with backtracking verbs ?! There’s certainly some more simple ways to get the jod done ! To my mind, if you’are able to express your needs in natural language, you’ll probably find out the correct regex more easily !
Secondly, as usual, you must test on a single file to begin with and verify that the replacement is exactly what you expect to !
Thirdly, if you are to replace a lot of files in one go, please, do a backup of these files, first : One never knows !
Fourthly, it could be that one or several files, needing replacement, are too important in size or complicated and lead to the message : The complexity of matching the regular expression exceeded predefined bounds. Try refactoring the regular expression to make each choice made by the state machine unambiguous.. You could probably try the replacement on subsets of all your files ?

Regarding your regex :

It contains two parts :

The regex <code\s*style="background-color:\s*transparent;" which represents the true searched string
The regex ((?:<p[^>]*?color: black.*?>[\S\s\n]*?<\/p>\s*<span[^>]*>)|(?:<span[^>]*?color: black.*?>[\S\s\n]*?<code)) which is followed with the backtracking verbs (*SKIP)(*F). Note that this is the first alternative of the regex and, as soon as this part is matched, this part is skipped and the search process fails. Thus, this first alternative is ignored and the regex engine tries the second alternative ( our searched string )
Of course, in the case that this first alternative cannot be found at all, the regex egine simply tries the second alternative !

@dr-ramaanand, given your example, below :

Mark the part of text that you are looking for, so the regex
- (?s)<code\s*style="background-color:\s*transparent;" : You should get 6 occurrences
Now, move to the Find dialog and searches for the first alternative, without the part (*SKIP)(*F). So the regex :
- (?s)((?:<p[^>]*?color: black.*?>[\S\s\n]*?<\/p>\s*<span[^>]*>)|(?:<span[^>]*?color: black.*?>[\S\s\n]*?<code))
As you can see, it spans through two lines and the first red mark is embedded in the match so will not be considered. Thus, near the end, the second line is matched. The third short line is also matched, as well.
Now, after clicking of the Find Next button, note that the second occurrence of the first alternative OVERLAPS the fourth red mark, so this searched string cannot be found at this point and a searched string is only found at the very end of the fifth line. Then, the sixth red mark is also matched on the sixth non-empty line !

You can verify, using the whole regex, that, indeed, only four zones, out of the 6, are matched !

<html>
<p style="font-family: &quot;verdana&quot;; font-size: 18px; color: black; line-height: 18px; text-align: justify; font-style: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; background-color: cyan;"><span style="font-size: 13.5pt; font-family: &quot;Verdana&quot;,&quot;sans-serif&quot;;"><code style="background-color: transparent;"><b>some text here</b></code></span></p>
<span><span style="font-size: 13.5pt; font-family: &quot;Verdana&quot;,&quot;sans-serif&quot;; background-color: cyan;"><code style="background-color: transparent;"><b>some text here</b></code></span>

<code style="background-color: transparent;">

<p style="font-family: &quot;verdana&quot;; font-size: 18px; color: cyan; line-height: 18px; text-align: justify; font-style: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; background-color: cyan;"><span style="color: black; font-size: 13.5pt; font-family: &quot;Verdana&quot;,&quot;sans-serif&quot;;"><code style="background-color: transparent;"><b>some text here</b></code></span></p>
<span><span style="font-size: 13.5pt; font-family: &quot;Verdana&quot;,&quot;sans-serif&quot;; background-color: cyan;"><code style="background-color: transparent;"><b>some text here</b></code></span>
<p style="font-family: &quot;verdana&quot;; font-size: 18px; color: cyan; line-height: 18px; text-align: justify; font-style: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; background-color: navy;"><span style="font-size: 13.5pt; font-family: &quot;Verdana&quot;,&quot;sans-serif&quot;;"><code style="background-color: transparent;"><b>some text here</b></code></span></p>
</html>

However, I could not understand exactly in which situations you need to do a replacement and, of course, the contents of the replacement itself !

So, read this post first ! After a 2-hour lunch break, try to be connected : I’ll post you my e-mail address. Note it as soon as possible, as it will be displayed a short amount of time ! I’ll probably delete this short message !

Best Regards,

guy038

dr ramaanand

@guy038 My Regular expression helped skip finding the <code\s*style="background-color:\s*transparent;"[^>]*> if it was preceded by <p.......color: black...>(any white spaces, including if they were on the next line)<span.......>, or <span.......color: black...> and found other strings of <code\s*style="background-color:\s*transparent;"[^>]*> in the current, open file, but not in all the files of a folder (when I selected, “Find in files” and clicked on, “Replace in files”), probably because it is getting into a continuous loop. How to avoid it from getting into a loop? I get an, “Invalid Regular expression” error if I try to find this or replace this in multiple files of a folder; then on clicking the […] icon next to where it shows that, “Invalid Regular expression”, error, I am getting this message: “The complexity of matching the regular expression exceeded predefined bounds. Try refactoring the regular expression to make each choice made by the state machine unambiguous. This exception is thrown to prevent eternal matches that take an indefinite period time to locate.” I will manage any replacements on my own. Thank you very much!

guy038

This post is deleted!

guy038

Hi, @dr-ramaanand,

I hope you got time to memorize my e-mail address !

BR

guy038

dr ramaanand

@guy038 No, I believe that you posted it after I slept and deleted it before I woke up. In the meantime, I worked on that RegEx and arrived at this Regular expression as a solution: (?s)(?:<p[^>]*?color:\s*black[^>]*>\s*<span[^>]*>\s*<code|<span[^>]*?color:\s*black[^>]*>\s*<code)(*SKIP)(*F)|<code\s*style="background-color:\s*transparent;"> - that can be shortened to (?s)(?:<p[^>]*?(color:\s*black[^>]*>\s*)<span[^>]*>\s*<code|<span[^>]*?$1<code)(*SKIP)(*F)|<code\s*style="background-color:\s*transparent;">

guy038

Hello, @dr-ramaanand,

You said in your last post :

I worked on that RegEx and arrived at this Regular expression as a solution: (?s)(?:<p[^>]*?color:\s*black[^>]*>\s*<span[^>]*>\s*<code|<span[^>]*?color:\s*black[^>]*>\s*<code)(*SKIP)(*F)|<code\s*style="background-color:\s*transparent;"> - that can be shortened to (?s)(?:<p[^>]*?(color:\s*black[^>]*>\s*)<span[^>]*>\s*<code|<span[^>]*?$1<code)(*SKIP)(*F)|<code\s*style="background-color:\s*transparent;">

No, you’re wrong ! Even if I apply, against your tiny example text, the first regex :

(?s)(?:<p[^>]*?color:\s*black[^>]*>\s*<span[^>]*>\s*<code|<span[^>]*?color:\s*black[^>]*>\s*<code)(*SKIP)(*F)|<code\s*style="background-color:\s*transparent;">

It returns the message : Mark: 4 matches in entire file

But, if I run the second regex :

(?s)(?:<p[^>]*?(color:\s*black[^>]*>\s*)<span[^>]*>\s*<code|<span[^>]*?$1<code)(*SKIP)(*F)|<code\s*style="background-color:\s*transparent;">

It returns the message : Mark: 5 matches in entire file

Thus, there are not identical !

Again, here is my E-mail address :

See you later, by e-mail !

Best Regards,

guy038

dr ramaanand

@guy038 Okay, got it, thank you very much. Merci beaucoup

Alan Kilborn

@guy038 said:

it could be that one or several files, needing replacement, are too important in size or complicated and lead to the message : The complexity of matching the regular expression exceeded predefined bounds. Try refactoring the regular expression to make each choice made by the state machine unambiguous.

This made me wonder what happens when this problem occurs during Replace in Files.

Say this occurs in a file that isn’t opened by the user in Notepad++; does the file then get opened and user is left looking at it (to know in which file the problem occurred)? If not, what happens to prior replacements already made in that file (that didn’t hit a complexity problem); are they then saved in the disk file? Does the search/replace then continue with the next file in the sequence or is the whole replacement set operation cancelled (probably)?

This whole thing doesn’t sound like a good situation…

I’d say it is probably advisable that every replacement operation of this nature should be preceded by the search operation (executed by the user), just to avoid the possibility of raising these issues – if the search yields a complexity problem, then (obviously!) don’t attempt the replacement.

dr ramaanand

This Regular expression helped me replace in files (multiple files) only what I wanted: (?:<p(?!\w)[^>]*?color\s*:\s*(?:(black))[^>]*?>(?(1)(?:\s*<span(?!\w)[^>]*?>)?)|<span\b[^>]*?color\s*:\s*black[^>]*?>|<li\b[^>]*?style[^>]*?color\s*:\s*black[^>]*?>\s*<span\b[^>]*?>)(?s)\s*<code\b(?:".*?"|'.*?'|[^>]*?)+>(*SKIP)(*FAIL)|<code\b[^>]*?style[^>]*?background-color\s*:\s*transparent[^>]*?> as asked right on top in my first question of this thread, that is, it helps find <code style="background-color: transparent;"> if it is not preceded by <p.......color: black...>(any white spaces, including a new line)<span.......> or <span.......color: black...>

guy038

@dr-ramaanand,

Take the time to read my last e-mail to you, where I explained the differences between two simple regexes containing, each, the (*SKIP)(*F) syntax !

BR

guy038