Find-in-FIles: Can’t Replace Multiple Instances of Word
-
@Coises Ships passing in the night.
I just made a post saying when I searched for the word lamb, it also found the word Lamb in the <title>, not just after lyrics-text. Search text:
<!DOCTYPE HTML> <html lang="en-us"> <head> <meta charset="utf-8"> <title>Mary Had a Little Lamb</title> <meta name="description" content="Words: Sarah Hale, 1830. Music: None."> <meta name="keywords" content="Sarah Hale"> <link rel="stylesheet" href="../../css/hymn.css"> <script src="../../js/jquery.js"></script> <script src="../../js/base.js"></script> <script src="../../js/hymn.js"></script> <link rel="prev" href="../../htm/h/e/w/o/hewonsav.htm"> <link rel="next" href="../../htm/h/e/s/a/hesallwo.htm"> <link rel="up" href="../../ttl/ttl-h.htm"> </head> <body> <section id="preface"> <h1 class="screen-reader-only">Introduction</h1> <div class="preface-text"> <p><span class="lead">Words:</span> <a href="../../bio/h/a/l/e/hale_sjb.htm">Sarah J. Hale</a>, 1830.</p> <p><span class="lead">Music:</span> John Doe (<a href="../../mid/d/u/m/m/dummy.mid" title="Listen to music, MIDI format">🔊</a> <a href="../../pdf/en/d/u/m/m/Dummy.pdf" title="Download score, PDF format">pdf</a> <a href="../../nwc/d/u/m/m/Dummy.nwc" title="Download score, Noteworthy Composer format">nwc</a>).</p> </div> </section> <p>This page is used to test global search-and-replace using regular expressions. </p> <section class="lyrics"> <div class="stanzas"><div class="lyrics-text mc ll"> <p>Mary had a little lambkin,<br> Its fleece was white as snow.<br> And everywhere that Mary went,<br> The lamb was sure to go.<br> He followed her to school one day,<br> That was against the rule.<br> It made the children laugh and play<br> To see a lamb at school.</p> <p>And so the teacher turned him out,<br> But still he lingered near,<br> And waited patiently about<br> Till Mary did appear.<br> And then he ran to her, and laid<br> His head upon her arm,<br> As if he said <q>I’m not afraid,<br> You’ll keep me from all harm.</q></p> <p><q>What makes the lamb love Mary so?</q><br> The eager children cry.<br> <q>‘Oh, Mary loves the lamb, you know,</q><br> The teacher did reply.<br> <q>And you each gentle animal<br> In confidence may bind,<br> And make them follow at your call,<br> If you are always kind.</q> </p> </div></div> </section> </body> </html>
It sounds like your comment addresses that. Am I correct?
-
@Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
Ships passing in the night.
More than that, you aren’t noticing all the posts because of the rapid posting.
I already explained exactly what happened with
\G
ion this post, which contains a fix for the\G
issue.@Coises’s follow-on showed that if any of your files don’t have
lyrics-text
at all, then my fix-for-\G
will replace all instances ofstar
orlamb
or what have you – but I’m hoping, for your sake, that all the files that your Find in Files filter will match will containlyrics-text
somewhere. -
@Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
(?:lyrics-text|\G)
(?:...)
is a non-capturing subgroup . -
@PeterJones said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
@Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
Ships passing in the night.
More than that, you aren’t noticing all the posts because of the rapid posting.
I already explained exactly what happened with
\G
ion this post, which contains a fix for the\G
issue.@Coises’s follow-on showed that if any of your files don’t have
lyrics-text
at all, then my fix-for-\G
will replace all instances ofstar
orlamb
or what have you – but I’m hoping, for your sake, that all the files that your Find in Files filter will match will containlyrics-text
somewhere.Took me this long to get it (so I’ll post here rather than editing my earlier comment), but I think:
(?s)(\A.*?(lyrics-text|\Z(*COMMIT)(*FAIL))|\G).+?(?<!^)(?<!<p>)(?<!<p class="chorus">)\Kstar(?=(.+?</div>))
fixes that problem.
-
@PeterJones Three things:
-
Yes, all the files (assuming they’re generated properly from my template) have the string lyrics-text.
-
I just tried Coises’ suggested modification to the regex:
(?s)(\A.*?lyrics-text|\G).+?(?<!^)(?<!<p>)(?<!<p class=“chorus”>)\Klamb(?=(.+?</div>))
As advertised, it no longer matches the Lamb in the <title> tag, which is the desired behavior, since we’re only changing lyrics.
- What is the construct that resembles a lookbehind, but has the asterisk & question mark? That is,
(\A.*?lyrics-text|\G)
Still testing, but things are looking more and more promising!
-
-
@PeterJones “I see,” said the blind carpenter, as he picked up his hammer and saw!
-
Suggested reading:
Perl Regular Expression Syntax
Boost-Extended Format String SyntaxNotepad++ uses the Boost regular expression library. The above links are to the documentation for the current version; I believe Notepad++ is a couple minor versions behind, but there should be little or no practical difference.
-
@Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
I just tried Coises’ suggested modification to the regex:
That was @PeterJones, not me. I was in the process of writing a post explaining why it couldn’t be done when he posted showing how to do it.
-
@Sylvester-Bullitt Good news!
Testing the new-and-improved regex against 2 files on disk worked perfectly!
It even worked when I had to undo a mistake with the replacement string, changing it to the one I really meant (I just changed the regex and clicked Replace All again).
So for now (fingers tightly crossed), it looks like we can declare victory! Does anyone have any more pearls of wisdom to add to this adventure?
Thank you so much for for your help and patience.
By the way, if you’d like to see the Web site where this will be used, click here !
Cheers!
-
@Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
- What is the construct that resembles a lookbehind, but has the asterisk & question mark?
That was answered here
-
@PeterJones I spoke to soon. Sigh.
I just ran this regex against live Web site files (fortunately, just Find All, not replacing anything yet):
(?s)(\A.*?lyrics-text|\G).+?(?<!^)(?<!<p>)(?<!<p class=“chorus”>)\KSavior(?=(.+?</div>))
This regex had ignored the <title> element in my earlier tests, but it did not ignore the title in the file text below (i.e., it matched the word Savior in the title). Can anyone see why?
<!DOCTYPE HTML> <html lang="en-us"> <head> <meta charset="utf-8"> <title>Alas! and Did My Savior Bleed?</title> <meta name="alt-title" content="At the Cross"> <meta name="description" content="Words: Isaac Watts, 1709. Music: Hugh Wilson, 1800."> <meta name="keywords" content="Isaac Watts,Hugh Wilson,Ralph Hudson"> <link rel="stylesheet" href="../../../../../css/hymn.css"> <script src="../../../../../js/jquery.js"></script> <script src="../../../../../js/languages.js"></script> <script src="../../../../../js/base.js"></script> <script src="../../../../../js/hymn.js"></script> <link rel="prev" href="../../../i/r/f/airfille.htm"> <link rel="next" href="../../../b/n/a/abnature.htm"> <link rel="up" href="../../../../../ttl/ttl-a.htm"> <link rel="alternate" href="../../../../../non/es/e/n/l/a/enlacruz.htm" hreflang="es"> <link rel="alternate" href="../../../../../non/ml/a/l/a/s/alas_and_did_my_savior_bleed_ml.htm" hreflang="ml"> <link rel="alternate" href="../../../../../non/ml/a/l/a/s/alas_and_did_my_savior_bleed_2_ml.htm" hreflang="ml"> </head> <body> <section> <h1 class="screen-reader-only">Scripture Verse</h1> <div class="css-marquee" role="marquee"> <p><q>There is one God and one mediator between God and men, the man Christ Jesus, who gave Himself as a ransom for all men.</q> 1 Timothy 2:5–6</p> </div> </section> <section id="preface"> <h1 class="screen-reader-only">Introduction</h1> <figure><img alt="portrait" src="../../../../../img/w/a/t/t/watts_i.jpg" width="200" height="300"><figcaption>Isaac Watts<br>1674–1748</figcaption></figure> <div class="preface-text"> <p><span class="lead">Words:</span> <a href="../../../../../bio/w/a/t/t/watts_i.htm">Isaac Watts</a>, <cite class="book verbose">Hymns and Spiritual Songs</cite> 1707–09<span class="verbose">, Book 2, number 9. <q>Godly sorrow arising from the sufferings of Christ.</q> <a href="../../../../../bio/h/u/d/s/hudson_re.htm">Ralph E. Hudson</a> wrote the refrain in 1885</span>.</p> <p><span class="lead">Music:</span> <span class="music verbose">Martyrdom</span> <a href="../../../../../bio/w/i/l/s/o/n/h/wilson_h.htm">Hugh Wilson</a>, 1800 (<a href="../../../../../mid/m/a/r/t/martyrdom.mid" title="Listen to music, MIDI format">🔊</a> <a href="../../../../../pdf/en/m/a/r/t/Martyrdom.pdf" title="Download score, PDF format">pdf</a> <a href="../../../../../nwc/m/a/r/t/Martyrdom.nwc" title="Download score, Noteworthy Composer format">nwc</a>)<span class="verbose"> (does not use the refrain)</span>.</p> <div class="alt-tune"> <p>Alternate Tunes:</p> <ul> <li><span>Abney (Hull)</span> <a href="../../../../../bio/h/u/l/l/hull_a.htm">Asa Hull</a> (1828–1907) (<a href="../../../../../mid/a/b/n/e/abney_hull.mid" title="Listen to music, MIDI format">🔊</a> <a href="../../../../../pdf/en/a/b/n/e/Abney(Hull).pdf" title="Download score, PDF format">pdf</a> <a href="../../../../../nwc/a/b/n/e/Abney(Hull).nwc" title="Download score, Noteworthy Composer format">nwc</a>)</li> <li><span>Hudson</span> <a href="../../../../../bio/h/u/d/s/hudson_re.htm">Ralph E. Hudson</a>, <cite class="book">Songs of Peace, Love and Joy</cite> (<span class="map" onclick="show('Alliance,OH')">Alliance</span> Ohio: 1885) (<a href="../../../../../mid/h/u/d/s/hudson.mid" title="Listen to music, MIDI format">🔊</a> <a href="../../../../../pdf/en/a/t/t/h/AtTheCross.pdf" title="Download score, PDF format">pdf</a> <a href="../../../../../nwc/a/t/t/h/AtTheCross.nwc" title="Download score, Noteworthy Composer format">nwc</a>) (uses refrain below). It is with this tune that the hymn is known as <span class="hymn-title">At the Cross.</span></li> <li><span>Liberty Hall</span> in <cite class="book">Wyeth’s Repository of Sacred Music</cite>, by <a href="../../../../../bio/w/y/e/t/wyeth_j.htm">John Wyeth</a>, 1810 (<a href="../../../../../mid/l/i/b/e/liberty_hall.mid" title="Listen to music, MIDI format">🔊</a> <a href="../../../../../pdf/en/l/i/b/e/LibertyHall.pdf" title="Download score, PDF format">pdf</a> <a href="../../../../../nwc/l/i/b/e/LibertyHall.nwc" title="Download score, Noteworthy Composer format">nwc</a>)</li> </ul></div></div> <figure><img alt="illustration" src="../../../../../img/c/r/u/c/Crucifixion,SimonVouet.jpg" height="300" width="200"><figcaption>Crucifixion<br>Simon Vouet<br>1590–1649</figcaption></figure> </section> <section> <h1 class="screen-reader-only">Background</h1> <blockquote class="verbose mc"> <p>[In] the autumn of 1850…revival meetings were being held in the Thirtieth Street Methodist Church. Some of us went down every evening; and, on two occasions, I sought peace at the atlar [sic], but did not find the joy I craved, until one evening, November 20, 1850, it seemed to me that the light must indeed come then or never; and so I arose and went to the altar alone. After a prayer was offered, they began to sing the grand old consecration hymn,</p> <p lang="en-gb"><q>Alas, and did my Saviour bleed, and did my Sovereign die?</q></p> <p>And when they reached the third line of the fourth stanza,</p> <p><q>Here Lord, I give myself away,</q></p> <p>My very soul was flooded with a celestial light. I sprang to my feet, shouting <q>hallelujah,</q> and then for the first time I realized that I had been trying to hold the world in one hand and the Lord in the other.</p> <p><a href="../../../../../bib/c/crosby.htm">Crosby</a>, p. 24</p> </blockquote> </section> <section class="lyrics"> <div class="audio"><audio class="primary" controls loop><source src="../../../../../ogg/m/a/r/t/martyrdom.ogg" type="audio/ogg"></audio></div> <h1 class="screen-reader-only">Lyrics</h1> <div class="stanzas"><div class="lyrics-text mc ll"> <p>Alas! and did my Savior bleed<br> And did my Sovereign die?<br> Would He devote that sacred head<br> For such a worm as I?</p> <p class="chorus">Refrain</p> <p class="chorus">At the cross, at the cross where I first saw the light,<br> And the burden of my heart rolled away,<br> It was there by faith I received my sight,<br> And now I am happy all the day!</p> <p>Thy body slain, sweet Jesus, Thine,<br> And bathed in its own blood,<br> While all exposed to wrath divine,<br> The glorious Sufferer stood!</p> <p class="chorus">Refrain</p> <p>Was it for crimes that I had done<br> He groaned upon the tree?<br> Amazing pity! grace unknown!<br> And love beyond degree!</p> <p class="chorus">Refrain</p> <p>Well might the sun in darkness hide<br> And shut his glories in,<br> When Christ, the mighty Maker died,<br> For man the creature’s sin.</p> <p class="chorus">Refrain</p> <p>Thus might I hide my blushing face<br> While His dear cross appears,<br> Dissolve my heart in thankfulness,<br> And melt my eyes to tears.</p> <p class="chorus">Refrain</p> <p>But drops of grief can ne’er repay<br> The debt of love I owe:<br> Here, Lord, I give my self away<br> ’Tis all that I can do.</p> <p class="chorus">Refrain</p> </div></div> </section> </body> </html>
-
@Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
Does anyone have any more pearls of wisdom to add to this adventure?
Be aware that these expressions match parts of words; e.g., the “star” in “starlight” or “restart” will be matched. I’ll leave it as an exercise for you to study a bit and attempt to find a fix for that, if it is a problem.
No regular expression thread is finished until @guy038 drops in to tell us that there’s a better way to do it.
-
You’ve found another glitch:
If there are no matches following
lyrics-text
, the expression we’ve suggested will match from the beginning of the file.All three matches are in the head section of the document. There are no matches after
lyrics-text
, because the word is hyphenated in the lyrics text. -
@Coises Is the glitch in the regular expression itself, or in the regex engine?
-
@Coises said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
No regular expression thread is finished until @guy038 drops in to tell us that there’s a better way to do it.
No kidding. But with the most recent failure, I think @guy038’s FAQ has already given us the solution that we should have used, if the OP had not stated it as an XY-Problem.
Looking at the example data, I think the better problem statement would be "please replace all instances of WORD_TO_FIND between the start
<section class="lyrics"
and end</section>
. With that set of rules, just use the Generic Regex Formula > Replace in a specific zone of text, with FR =\bWORD_TO_FIND\b
, and my “start” and “end” a sentence ago are the BSR and ESR. (Though you might have to add your(?<!^)(?<!<p>)(?<!<p class="chorus">)
restrictions to the FR)Is the glitch in the regular expression itself, or in the regex engine?
The “glitch” is the expectation that one can safely edit HTML with regex (see FAQ). Since I’m sure you’ll insist on it anyway, then dealing with glitches is something you must expect, and that you must start putting effort into.
We’ve gone above-and-beyond in getting it working this well for you. At this point, it’s really time for you to start reading the same documentation that we’re reading, and try to figure it out on your own.
-
@Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:
@Coises Is the glitch in the regular expression itself, or in the regex engine?
The expression. After testing, it appears that my variant, which avoids matching files that don’t contain
lyrics-text
, also fixes this problem. Applying a small simplification to the previous version (the\G
was unnecessary) this:(?s)(\A.*?(lyrics-text|\Z(*COMMIT)(*FAIL)).*?|)(?<!^)(?<!<p>)(?<!<p class="chorus">)\Ksavior(?=(.+?</div>))
matches nothing, as it should; this:
(?s)(\A.*?(lyrics-text|\Z(*COMMIT)(*FAIL)).*?|)(?<!^)(?<!<p>)(?<!<p class="chorus">)\Ksav\xADior(?=(.+?</div>))
matches the the single occurrence of the word (which is hyphenated using a “soft hyphen”) in the lyrics, on line 63.
-
@Coises As Peter suggested a few minutes ago, we’re now trying the approach shown at https://community.notepad-plus-plus.org/topic/22690/generic-regex-replacing-in-a-specific-zone-of-text
Based in the example there, we’re testing this regex:
(?-si:<section class="lyrics">|(?!\A)\G)(?s-i:(?!</section>).)*?(?<!^)(?<!<p>)(?<!<p class="chorus">)\K(?-si:\bWORD_TO_FIND\b)
We also discovered a blemish in our previous version: The quote marks around the word “chorus” were typographical, and should have been standards typewriter-style quotes (").
We’ve been testing the regex above on live Web site files, and so far things are going well (fingers crossed as tightly as ever!).
-
Hello, @coises, @sylvester-bullitt, @peterjones and All,
@coises, you said in a previous post :
No regular expression thread is finished until @guy038 drops in to tell us that there’s a better way to do it.
Well, many thanks, @coises, for your kind words, but I, definitively, do not deserve this honor, because you and some other people could easily be included in this list !
I noticed that, given the large number of regex solutions, that most of us have been proposing for some time now, we’re getting fewer questions on this subject. To my mind, this means that the general level of N++ users, regarding the regex world, is increasing which is, globally, a good thing for a better N++ use, along with the other script solutions and their workflow !
I suppose, that the regex section, described in the @peterjones’s official documentation, did help some of us, too, from time to time !
BTW, I did not drop in this discussion, but the generic regex, suggested by @sylvester-bullitt, in its last post, seems to be the right solution
Best Regards,
guy038
-
@guy038 First, let me thank you for the work you’ve done on helping develop generic regex solutions. And you’ right, the solution I mentioned yesterday, which I was testing, seemed to be very promising.
However, I woke up in the middle of last night, and realized that we may have overlooked a potential pitfall. As you may remember, my ultimate objective was to modify texts in song lyrics, and the generic regex on the Notepad++ site seems to be an ideal fit for my use case.
Though it seems to be working well so far, I’m wondering if we overlooked one thing: the the search term might be part of a hyperlink URL, and thus should not be changed. I’m running a hyperlink report on the Web site now to see if any of the links have been broken. I I don’t know the answer yet, but I should know within the next hour.
If the regex did indeed match/modify/break some URLs, I plan to ad a negative lookahead to exclude matches which precede .htm or an underscore. Hopefully that will be enough to prevent us from inadvertently changing links.
Have you run into the issue of breaking HTML links with a regex search-and-replace before?
-
@Sylvester-Bullitt Got done generating broken link report.
THE GOOD NEWS: My regex didn’t break any links
THE BAD NEWS: I just go lucky. Some further testing revealed that my regex would have broken links, if I had had the bad luck to use a search term that also in a hyperlink URL.
So, I added lookaheads to ignore matches of underscores and .htm, and it seems to work. In case anyone’s interested, here’s the new-and-improved regex, with some comments added for clarity:
(?-si:<section class="lyrics">|(?!\A)\G)(?s-i:(?!</section>).)*?(?#Not at start of line or para)(?<!^)(?<!<p>)(?<!<p class="chorus">)\K(?-si:\bWORD_TO_FIND\bq(?#Not in hyperlink)(?!(\.htm))(?!_))