Find-in-FIles: Can’t Replace Multiple Instances of Word
-
@Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
Is there some reason this functionality couldn’t be added to the search-and-replace in multiple files?
Your regex doesn’t do all the replacements in a single file when you do Replace All once, and you have to hit Replace or Replace All multiple times to get it to work, even with your “wraparound” option in a single file. Why would you expect the Find in Files version to behave any differently across multiple files? The problem is not with Notepad++'s user interface not having enough options, but in your understanding of the regex involved, and the way that “wrap around” works.
Regarding “wrap around”:
<PrincessBride character="Inigo">
I do not think that word means what you think it means</PricessBride>
. It’s a holdover from the terminology present in Microsoft Windows’ “notepad.exe” search dialog for time immemorial: they called it “wrap around”, because in normal searches, the search starts where your cursor is, but if it reaches the end of the file, it “wraps around” to the beginning once.In the Notepad++ implementation, it follows that behavior for a standard search or replace. But with Search All or Replace All, as described here, it makes exactly one pass through the entire document, starting at the beginning (regardless of where your caret is) and going through the very end.
This “exactly one pass” is why your original regex with Replace All in one file didn’t replace all instances unless you hit it multiple times, and it’s exactly the same reason it doesn’t Replace All in Files when you hit Replace All once.
You designed a regex that “consumes” too much of the data, so it cannot do it all in one – unless you put it into an infinite loop mode (which I think is not the right idea, nor do I think Notepad++ should have to implement an “infinite loop mode”). The regex syntax has a
\G
assertion, which means “match at the end of previous match”, which can be used in a regex alternation to good effect (see, for example, it’s use in the FAQ: Generic Regex => Replacing in a Zone, where once you enter a zone,\G
allows you to continue in the same zone, even though the regex cannot “see” the beginning of the zone any more.A simple example text to show what
\G
allows in an alternation:line prefix: word another word final word word another word final word line prefix: word another word final word
The regex FIND=
^line prefix:.*?\Kword
REPLACE=newText
only replaces the firstword
on theline prefix:
lines, so you have to hit Replace All three times to do them all.But if you change it to FIND=
(^line prefix:|\G).*?\Kword
, then the same action only takes one Replace All.Thus, if your original regex was working for doing a single replacement that had to be run multiple times:
lyrics-text.+?(?<!^)(?<!<p>)(?<!<p class="chorus">)\Kword_to_find(?=(.+?</div>))
… then my guess is that this slight change, putting an alternation group before the
\K
with a\G
as the second choice in the alternative section, would work (but I don’t have the time to test it out thoroughly for you):(?:lyrics-text|\G).+?(?<!^)(?<!<p>)(?<!<p class="chorus">)\Kword_to_find(?=(.+?</div>))
----
Useful References
-
@PeterJones Interesting. I’ve never had an occasion to use /g before. Maybe now’s the time for me to learn something new about the mysteries of regex.
At any rate, I probably won’t be able to look at it today. I’ll try to start digging into your suggestions tomorrow. Thanks for taking the time to give such detailed feedback!
-
@Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
/g
Exact characters and capitalization are important in regex.
/g
means "the literal/
character followed by the literal lower-caseg
character; whereas I used\G
, which is the Continuation Escape found in the Anchors section of the regex documentation , and holds special meaning to regex. Using the/g
instead of\G
will not give the same results. -
@Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
Regarding a scripting language, I am familiar with PowerShell (posh). Could you give me some insight on how you think posh might be used with NPP?
I would not attempt to use PowerShell with, but rather instead of Notepad++.
I’m afraid I haven’t yet learned PowerShell (despite using it anyway), but you probably want to start by reviewing:
Get-ChildItem
Get-Content
New-Item
about Regular Expressions
-match and -replace operatorsEstablish a folder where you’ll put your results.
Get the collection of files you want to examine.
For each file, get the content.
Determine if the content requires modification.
Iterate through the content making the needed modifications.
Copy the file if it’s unchanged, write a new file if it is changed. -
@Coises Got it. Thanks!
-
@PeterJones I tried the regex modification you suggested
(?:lyrics-text|\G).+?(?<!^)(?<!<p>)(?<!<p class="chorus">)\Kstar(?=(.+?</div>))
To keep it simple, I ran it against a single file in the editor (rather than multiple files on disk). The file text that was in the editor:
<!DOCTYPE HTML> <html lang="en-us"> <head> <meta charset="utf-8"> <title>Twinkle, Twinkle, Little Star</title> <meta name="description" content="Words: Jane Taylor, 1806. Music: ___, ___"> <meta name="keywords" content="Jane Taylor"> <link rel="stylesheet" href="../../css/hymn.css"> <script src="../../js/jquery.js"></script> <script src="../../js/base.js"></script> <script src="../../js/hymn.js"></script> <link rel="prev" href="../../htm/h/e/w/o/hewonsav.htm"> <link rel="next" href="../../htm/h/e/s/a/hesallwo.htm"> <link rel="up" href="../../ttl/ttl-h.htm"> </head> <body> <section id="preface"> <h1 class="screen-reader-only">Introduction</h1> <div class="preface-text"> <p><span class="lead">Words:</span> <a href="../../bio/t/a/y/l/taylor_jane.htm">Jane Taylor</a>, 1806.</p> <p><span class="lead">Music:</span> John Doe (<a href="../../mid/d/u/m/m/dummy.mid" title="Listen to music, MIDI format">🔊</a> <a href="../../pdf/en/d/u/m/m/Dummy.pdf" title="Download score, PDF format">pdf</a> <a href="../../nwc/d/u/m/m/Dummy.nwc" title="Download score, Noteworthy Composer format">nwc</a>).</p> </div> </section> <p>This page is used to test global search-and-replace using regular expressions.</p> <section class="lyrics"> <h1 class="screen-reader-only">Lyrics</h1> <div class="stanzas"><div class="lyrics-text mc ll"> <p>Twinkle, twinkle, little star,<br> How I wonder what you are!<br> Up above the world so high,<br> Like a diamond in the sky.</p> <p>When the blazing sun is gone,<br> When he nothing shines upon,<br> Then you show your little light,<br> Twinkle, twinkle, all the night.</p> <p>Then the trav’ller in the dark,<br> Thanks you for your tiny spark,<br> He could not see which way to go,<br> If you did not twinkle so.</p> <p>In the dark blue sky you keep,<br> And often thro’ my curtains peep,<br> For you never shut your eye,<br> Till the sun is in the sky.</p> <p>’Tis your bright and tiny spark,<br> Lights the trav’ller in the dark:<br> Tho’ I know not what you are,<br> Twinkle, twinkle, little star.</p> </div></div> </section> </body> </html>
I put the cursor at the beginning of the file, clicked the Find Next button. and got this error.
If I click the Replace All button instead, a different message appears: Replace All: 0 occurrences were replaced in entire file.
-
@Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
The file text that was in the editor:
I tried that text with your original regex from your first post, and got the same result.
As I said, “if your original regex was working for doing a single replacement that had to be run multiple times”. Your text didn’t match your regex even once, thus the “if” condition was not met, and you should not expect my modification to work.
-
@PeterJones Oops. Mea culpa.
Forgot to change the search mode to Regular Expression. After doing that, and clicking Replace All, all 3 occurrences were replaced in one fell swoop. Sorry for the confusion.
My next step is to try it on multiple files on the hard disk.
Whew!
-
@PeterJones said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
I tried that text with your original regex from your first post, and got the same result.
My test of your regex+data was failing for a different reason than your test had failed: as you said, yours failed because you forgot to enable Regular Expression mode.
My test, on the other hand, failed because I didn’t have “. matches newline” checkmarked. Once I did that, I could get the search to work with either your original or my edited regex.
-
@PeterJones I love it when a plan comes together!
Will let you know how testing goes “in the wild” (i.e., on actual files).
-
@Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
(?:lyrics-text|\G).+?(?<!^)(?<!<p>)(?<!<p class="chorus">)\Kstar(?=(.+?</div>))
…
all 3 occurrences were replaced in one fell swoop.I was surprised it was 3 occurrences, because the first occurrence was before
lyrics-text
.I was reminded that the
\G
can actually match the start of the text under certain circumstances. This isn’t 100% clear in the User Manual, but in the Boost Regex documentation that it links to, it says (emphasis mine),The sequence
\G
matches only at the end of the last match found, or at the start of the text being matched if no previous match was found.To prevent
\G
from matching the start, you need to make sure the first alternative consumes the \A: FIND =(?s)(\A.*?lyrics-text|\G).+?(?<!^)(?<!<p>)(?<!<p class="chorus">)\Kstar(?=(.+?</div>))
With that, it only finds and replaces 2 in your “Twinkle Twinkle” file, instead of 3. -
This post is deleted! -
@PeterJones said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
This isn’t 100% clear in the User Manual,
I have tweaked the UM to include the phrase from the boost manual, to make it more clear. It is doubtful anything I write, especially regarding regular expressions, can be 100% clear. ;-)
-
@PeterJones Peter, I’m unfamiliar with the syntax (?:lyrics-text|\G). It resembles a lookbehind, but all the lookbehinds I’ve seen look like (? <=a) (i.e., no colon). What exactly is this thing? Where is it documented?
-
@PeterJones said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
To prevent \G from matching the start, you need to make sure the first alternative consumes the \A: FIND = (?s)(\A.*?lyrics-text|\G).+?(?<!^)(?<!<p>)(?<!<p class=“chorus”>)\Kstar(?=(.+?</div>))
With that, it only finds and replaces 2 in your “Twinkle Twinkle” file, instead of 3.It might not matter in @Sylvester-Bullitt’s application, but it should be noted that if the file does not contain the text
lyrics-text
at all, the expression given will replace every occurrence ofstar
. -
@Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
@PeterJones I love it when a plan comes together!
Will let you know how testing goes “in the wild” (i.e., on actual files).
Here’s the problem file:
<!DOCTYPE HTML> <html lang="en-us"> <head> <meta charset="utf-8"> <title>Mary Had a Little Lamb</title> <meta name="description" content="Words: Sarah Hale, 1830. Music: None."> <meta name="keywords" content="Sarah Hale"> <link rel="stylesheet" href="../../css/hymn.css"> <script src="../../js/jquery.js"></script> <script src="../../js/base.js"></script> <script src="../../js/hymn.js"></script> <link rel="prev" href="../../htm/h/e/w/o/hewonsav.htm"> <link rel="next" href="../../htm/h/e/s/a/hesallwo.htm"> <link rel="up" href="../../ttl/ttl-h.htm"> </head> <body> <section id="preface"> <h1 class="screen-reader-only">Introduction</h1> <div class="preface-text"> <p><span class="lead">Words:</span> <a href="../../bio/h/a/l/e/hale_sjb.htm">Sarah J. Hale</a>, 1830.</p> <p><span class="lead">Music:</span> John Doe (<a href="../../mid/d/u/m/m/dummy.mid" title="Listen to music, MIDI format">🔊</a> <a href="../../pdf/en/d/u/m/m/Dummy.pdf" title="Download score, PDF format">pdf</a> <a href="../../nwc/d/u/m/m/Dummy.nwc" title="Download score, Noteworthy Composer format">nwc</a>).</p> </div> </section> <p>This page is used to test global search-and-replace using regular expressions. </p> <section class="lyrics"> <div class="stanzas"><div class="lyrics-text mc ll"> <p>Mary had a little lambkin,<br> Its fleece was white as snow.<br> And everywhere that Mary went,<br> The lamb was sure to go.<br> He followed her to school one day,<br> That was against the rule.<br> It made the children laugh and play<br> To see a lamb at school.</p> <p>And so the teacher turned him out,<br> But still he lingered near,<br> And waited patiently about<br> Till Mary did appear.<br> And then he ran to her, and laid<br> His head upon her arm,<br> As if he said <q>I’m not afraid,<br> You’ll keep me from all harm.</q></p> <p><q>What makes the lamb love Mary so?</q><br> The eager children cry.<br> <q>‘Oh, Mary loves the lamb, you know,</q><br> The teacher did reply.<br> <q>And you each gentle animal<br> In confidence may bind,<br> And make them follow at your call,<br> If you are always kind.</q> </p> </div></div> </section> </body> </html>
When I searched for lamb, it found the expected instances in the lyrics section (class = “lyrics-text”), but surprisingly, it also found Lamb in the <title>. But the regex, as I originally wrote it, said it should only find matches after the string lyrics-text.
Did adding /G change the behavior I think you mentioned that it might make subsequent searches start at the beginning of the file.
-
@Coises Ships passing in the night.
I just made a post saying when I searched for the word lamb, it also found the word Lamb in the <title>, not just after lyrics-text. Search text:
<!DOCTYPE HTML> <html lang="en-us"> <head> <meta charset="utf-8"> <title>Mary Had a Little Lamb</title> <meta name="description" content="Words: Sarah Hale, 1830. Music: None."> <meta name="keywords" content="Sarah Hale"> <link rel="stylesheet" href="../../css/hymn.css"> <script src="../../js/jquery.js"></script> <script src="../../js/base.js"></script> <script src="../../js/hymn.js"></script> <link rel="prev" href="../../htm/h/e/w/o/hewonsav.htm"> <link rel="next" href="../../htm/h/e/s/a/hesallwo.htm"> <link rel="up" href="../../ttl/ttl-h.htm"> </head> <body> <section id="preface"> <h1 class="screen-reader-only">Introduction</h1> <div class="preface-text"> <p><span class="lead">Words:</span> <a href="../../bio/h/a/l/e/hale_sjb.htm">Sarah J. Hale</a>, 1830.</p> <p><span class="lead">Music:</span> John Doe (<a href="../../mid/d/u/m/m/dummy.mid" title="Listen to music, MIDI format">🔊</a> <a href="../../pdf/en/d/u/m/m/Dummy.pdf" title="Download score, PDF format">pdf</a> <a href="../../nwc/d/u/m/m/Dummy.nwc" title="Download score, Noteworthy Composer format">nwc</a>).</p> </div> </section> <p>This page is used to test global search-and-replace using regular expressions. </p> <section class="lyrics"> <div class="stanzas"><div class="lyrics-text mc ll"> <p>Mary had a little lambkin,<br> Its fleece was white as snow.<br> And everywhere that Mary went,<br> The lamb was sure to go.<br> He followed her to school one day,<br> That was against the rule.<br> It made the children laugh and play<br> To see a lamb at school.</p> <p>And so the teacher turned him out,<br> But still he lingered near,<br> And waited patiently about<br> Till Mary did appear.<br> And then he ran to her, and laid<br> His head upon her arm,<br> As if he said <q>I’m not afraid,<br> You’ll keep me from all harm.</q></p> <p><q>What makes the lamb love Mary so?</q><br> The eager children cry.<br> <q>‘Oh, Mary loves the lamb, you know,</q><br> The teacher did reply.<br> <q>And you each gentle animal<br> In confidence may bind,<br> And make them follow at your call,<br> If you are always kind.</q> </p> </div></div> </section> </body> </html>
It sounds like your comment addresses that. Am I correct?
-
@Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
Ships passing in the night.
More than that, you aren’t noticing all the posts because of the rapid posting.
I already explained exactly what happened with
\G
ion this post, which contains a fix for the\G
issue.@Coises’s follow-on showed that if any of your files don’t have
lyrics-text
at all, then my fix-for-\G
will replace all instances ofstar
orlamb
or what have you – but I’m hoping, for your sake, that all the files that your Find in Files filter will match will containlyrics-text
somewhere. -
@Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
(?:lyrics-text|\G)
(?:...)
is a non-capturing subgroup. -
@PeterJones said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
@Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
Ships passing in the night.
More than that, you aren’t noticing all the posts because of the rapid posting.
I already explained exactly what happened with
\G
ion this post, which contains a fix for the\G
issue.@Coises’s follow-on showed that if any of your files don’t have
lyrics-text
at all, then my fix-for-\G
will replace all instances ofstar
orlamb
or what have you – but I’m hoping, for your sake, that all the files that your Find in Files filter will match will containlyrics-text
somewhere.Took me this long to get it (so I’ll post here rather than editing my earlier comment), but I think:
(?s)(\A.*?(lyrics-text|\Z(*COMMIT)(*FAIL))|\G).+?(?<!^)(?<!<p>)(?<!<p class="chorus">)\Kstar(?=(.+?</div>))
fixes that problem.