Community
    • Login

    Find-in-FIles: Can’t Replace Multiple Instances of Word

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    regex html
    48 Posts 5 Posters 4.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • PeterJonesP
      PeterJones @Sylvester Bullitt
      last edited by

      @Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:

      1. What is the construct that resembles a lookbehind, but has the asterisk & question mark?

      That was answered here

      Sylvester BullittS 1 Reply Last reply Reply Quote 0
      • Sylvester BullittS
        Sylvester Bullitt @PeterJones
        last edited by

        @PeterJones I spoke to soon. Sigh.

        I just ran this regex against live Web site files (fortunately, just Find All, not replacing anything yet):

        (?s)(\A.*?lyrics-text|\G).+?(?<!^)(?<!<p>)(?<!<p class=“chorus”>)\KSavior(?=(.+?</div>))
        

        This regex had ignored the <title> element in my earlier tests, but it did not ignore the title in the file text below (i.e., it matched the word Savior in the title). Can anyone see why?

        <!DOCTYPE HTML>
        <html lang="en-us">
        
        <head>
        <meta charset="utf-8">
        <title>Alas! and Did My Savior Bleed?</title>
        <meta name="alt-title" content="At the Cross">
        <meta name="description" content="Words: Isaac Watts, 1709. Music: Hugh Wilson, 1800.">
        <meta name="keywords" content="Isaac Watts,Hugh Wilson,Ralph Hudson">
        <link rel="stylesheet" href="../../../../../css/hymn.css">
        <script src="../../../../../js/jquery.js"></script>
        <script src="../../../../../js/languages.js"></script>
        <script src="../../../../../js/base.js"></script>
        <script src="../../../../../js/hymn.js"></script>
        <link rel="prev" href="../../../i/r/f/airfille.htm">
        <link rel="next" href="../../../b/n/a/abnature.htm">
        <link rel="up" href="../../../../../ttl/ttl-a.htm">
        <link rel="alternate" href="../../../../../non/es/e/n/l/a/enlacruz.htm" hreflang="es">
        <link rel="alternate" href="../../../../../non/ml/a/l/a/s/alas_and_did_my_savior_bleed_ml.htm" hreflang="ml">
        <link rel="alternate" href="../../../../../non/ml/a/l/a/s/alas_and_did_my_savior_bleed_2_ml.htm" hreflang="ml">
        </head>
        
        <body>
        <section>
        <h1 class="screen-reader-only">Scripture Verse</h1>
        <div class="css-marquee" role="marquee">
        <p><q>There is one God and one mediator between God and men, the man Christ Jesus, who gave Himself as a ransom for all men.</q> 1 Timothy 2:5–6</p>
        </div>
        </section>
        
        <section id="preface">
        <h1 class="screen-reader-only">Introduction</h1>
        <figure><img alt="portrait" src="../../../../../img/w/a/t/t/watts_i.jpg" width="200" height="300"><figcaption>Isaac Watts<br>1674–1748</figcaption></figure>
        <div class="preface-text">
        <p><span class="lead">Words:</span> <a href="../../../../../bio/w/a/t/t/watts_i.htm">Is­aac Watts</a>, <cite class="book verbose">Hymns and Spi­ri­tu­al Songs</cite> 1707–09<span class="verbose">, Book 2, num­ber 9. <q>God­ly sor­row aris­ing from the suf­fer­ings of Christ.</q> <a href="../../../../../bio/h/u/d/s/hudson_re.htm">Ralph E. Hud­son</a> wrote the re­frain in 1885</span>.</p>
        <p><span class="lead">Music:</span> <span class="music verbose">Mar­tyr­dom</span> <a href="../../../../../bio/w/i/l/s/o/n/h/wilson_h.htm">Hugh Wil­son</a>, 1800 (<a href="../../../../../mid/m/a/r/t/martyrdom.mid" title="Listen to music, MIDI format">🔊</a> <a href="../../../../../pdf/en/m/a/r/t/Martyrdom.pdf" title="Download score, PDF format">pdf</a> <a href="../../../../../nwc/m/a/r/t/Martyrdom.nwc" title="Download score, Noteworthy Composer format">nwc</a>)<span class="verbose"> (does not use the re­frain)</span>.</p>
        <div class="alt-tune">
        <p>Alternate Tunes:</p>
        <ul>
        <li><span>Abney (Hull)</span> <a href="../../../../../bio/h/u/l/l/hull_a.htm">Asa Hull</a> (1828–1907) (<a href="../../../../../mid/a/b/n/e/abney_hull.mid" title="Listen to music, MIDI format">🔊</a> <a href="../../../../../pdf/en/a/b/n/e/Abney(Hull).pdf" title="Download score, PDF format">pdf</a> <a href="../../../../../nwc/a/b/n/e/Abney(Hull).nwc" title="Download score, Noteworthy Composer format">nwc</a>)</li>
        <li><span>Hudson</span> <a href="../../../../../bio/h/u/d/s/hudson_re.htm">Ralph E. Hud­son</a>, <cite class="book">Songs of Peace, Love and Joy</cite> (<span class="map" onclick="show('Alliance,OH')">Al­li­ance</span> Ohio: 1885) (<a href="../../../../../mid/h/u/d/s/hudson.mid" title="Listen to music, MIDI format">🔊</a> <a href="../../../../../pdf/en/a/t/t/h/AtTheCross.pdf" title="Download score, PDF format">pdf</a> <a href="../../../../../nwc/a/t/t/h/AtTheCross.nwc" title="Download score, Noteworthy Composer format">nwc</a>) (us­es re­frain be­low). It is with this tune that the hymn is known as <span class="hymn-title">At the Cross.</span></li>
        <li><span>Liberty Hall</span> in <cite class="book">Wy­eth’s Re­po­si­to­ry of Sac­red Mu­sic</cite>, by <a href="../../../../../bio/w/y/e/t/wyeth_j.htm">John Wy­eth</a>, 1810 (<a href="../../../../../mid/l/i/b/e/liberty_hall.mid" title="Listen to music, MIDI format">🔊</a> <a href="../../../../../pdf/en/l/i/b/e/LibertyHall.pdf" title="Download score, PDF format">pdf</a> <a href="../../../../../nwc/l/i/b/e/LibertyHall.nwc" title="Download score, Noteworthy Composer format">nwc</a>)</li>
        </ul></div></div>
        <figure><img alt="illustration" src="../../../../../img/c/r/u/c/Crucifixion,SimonVouet.jpg" height="300" width="200"><figcaption>Crucifixion<br>Simon Vouet<br>1590–1649</figcaption></figure>
        </section>
        
        <section>
        <h1 class="screen-reader-only">Background</h1>
        <blockquote class="verbose mc">
        <p>[In] the au­tumn of 1850…re­viv­al meet­ings were be­ing held in the Thir­ti­eth Street Me­tho­dist Church. Some of us went down ev­ery ev­en­ing; and, on two oc­ca­sions, I sought peace at the atlar [sic], but did not find the joy I craved, un­til one ev­en­ing, No­vem­ber 20, 1850, it seemed to me that the light must in­deed come then or ne­ver; and so I arose and went to the al­tar alone. Af­ter a pray­er was of­fered, they be­gan to sing the grand old con­se­cr­ation hymn,</p>
        <p lang="en-gb"><q>Alas, and did my Sav­iour bleed, and did my So­ve­reign die?</q></p>
        <p>And when they reached the third line of the fourth stan­za,</p>
        <p><q>Here Lord, I give my­self away,</q></p>
        <p>My very soul was flood­ed with a ce­les­tial light. I sprang to my feet, shout­ing <q>hal­le­lu­jah,</q> and then for the first time I real­ized that I had been try­ing to hold the world in one hand and the Lord in the oth­er.</p>
        <p><a href="../../../../../bib/c/crosby.htm">Crosby</a>, p. 24</p>
        </blockquote>
        </section>
        
        <section class="lyrics">
        <div class="audio"><audio class="primary" controls loop><source src="../../../../../ogg/m/a/r/t/martyrdom.ogg" type="audio/ogg"></audio></div>
        <h1 class="screen-reader-only">Lyrics</h1>
        <div class="stanzas"><div class="lyrics-text mc ll">
        <p>Alas! and did my Sav­ior bleed<br>
        And did my So­ver­eign die?<br>
        Would He de­vote that sac­red head<br>
        For such a worm as I?</p>
        <p class="chorus">Refrain</p>
        <p class="chorus">At the cross, at the cross where I first saw the light,<br>
        And the bur­den of my heart rolled away,<br>
        It was there by faith I re­ceived my sight,<br>
        And now I am hap­py all the day!</p>
        <p>Thy bo­dy slain, sweet Je­sus, Thine,<br>
        And bathed in its own blood,<br>
        While all ex­posed to wrath di­vine,<br>
        The glo­ri­ous Suf­fer­er stood!</p>
        <p class="chorus">Refrain</p>
        <p>Was it for crimes that I had done<br>
        He groaned up­on the tree?<br>
        Amazing pi­ty! grace un­known!<br>
        And love be­yond de­gree!</p>
        <p class="chorus">Refrain</p>
        <p>Well might the sun in dark­ness hide<br>
        And shut his glo­ries in,<br>
        When Christ, the migh­ty Mak­er died,<br>
        For man the crea­ture’s sin.</p>
        <p class="chorus">Refrain</p>
        <p>Thus might I hide my blush­ing face<br>
        While His dear cross ap­pears,<br>
        Dissolve my heart in thank­ful­ness,<br>
        And melt my eyes to tears.</p>
        <p class="chorus">Refrain</p>
        <p>But drops of grief can ne’er re­pay<br>
        The debt of love I owe:<br>
        Here, Lord, I give my self away<br>
        ’Tis all that I can do.</p>
        <p class="chorus">Refrain</p>
        </div></div>
        </section>
        
        </body>
        </html>
        
        
        CoisesC 1 Reply Last reply Reply Quote 0
        • CoisesC
          Coises @Sylvester Bullitt
          last edited by Coises

          @Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:

          Does anyone have any more pearls of wisdom to add to this adventure?

          Be aware that these expressions match parts of words; e.g., the “star” in “starlight” or “restart” will be matched. I’ll leave it as an exercise for you to study a bit and attempt to find a fix for that, if it is a problem.

          No regular expression thread is finished until @guy038 drops in to tell us that there’s a better way to do it.

          PeterJonesP 1 Reply Last reply Reply Quote 0
          • CoisesC
            Coises @Sylvester Bullitt
            last edited by

            @Sylvester-Bullitt

            You’ve found another glitch:

            If there are no matches following lyrics-text, the expression we’ve suggested will match from the beginning of the file.

            All three matches are in the head section of the document. There are no matches after lyrics-text, because the word is hyphenated in the lyrics text.

            Sylvester BullittS 1 Reply Last reply Reply Quote 1
            • Sylvester BullittS
              Sylvester Bullitt @Coises
              last edited by

              @Coises Is the glitch in the regular expression itself, or in the regex engine?

              CoisesC 1 Reply Last reply Reply Quote 0
              • PeterJonesP
                PeterJones @Coises
                last edited by

                @Coises said in Find-in-FIles: Can’t Replace Multiple Instances of Word:

                No regular expression thread is finished until @guy038 drops in to tell us that there’s a better way to do it.

                No kidding. But with the most recent failure, I think @guy038’s FAQ has already given us the solution that we should have used, if the OP had not stated it as an XY-Problem.

                Looking at the example data, I think the better problem statement would be "please replace all instances of WORD_TO_FIND between the start <section class="lyrics" and end </section> . With that set of rules, just use the Generic Regex Formula > Replace in a specific zone of text, with FR = \bWORD_TO_FIND\b, and my “start” and “end” a sentence ago are the BSR and ESR. (Though you might have to add your (?<!^)(?<!<p>)(?<!<p class="chorus">) restrictions to the FR)

                Is the glitch in the regular expression itself, or in the regex engine?

                The “glitch” is the expectation that one can safely edit HTML with regex (see FAQ). Since I’m sure you’ll insist on it anyway, then dealing with glitches is something you must expect, and that you must start putting effort into.

                We’ve gone above-and-beyond in getting it working this well for you. At this point, it’s really time for you to start reading the same documentation that we’re reading, and try to figure it out on your own.

                1 Reply Last reply Reply Quote 0
                • CoisesC
                  Coises @Sylvester Bullitt
                  last edited by

                  @Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:

                  @Coises Is the glitch in the regular expression itself, or in the regex engine?

                  The expression. After testing, it appears that my variant, which avoids matching files that don’t contain lyrics-text, also fixes this problem. Applying a small simplification to the previous version (the \G was unnecessary) this:

                  (?s)(\A.*?(lyrics-text|\Z(*COMMIT)(*FAIL)).*?|)(?<!^)(?<!<p>)(?<!<p class="chorus">)\Ksavior(?=(.+?</div>))
                  

                  matches nothing, as it should; this:

                  (?s)(\A.*?(lyrics-text|\Z(*COMMIT)(*FAIL)).*?|)(?<!^)(?<!<p>)(?<!<p class="chorus">)\Ksav\xADior(?=(.+?</div>))
                  

                  matches the the single occurrence of the word (which is hyphenated using a “soft hyphen”) in the lyrics, on line 63.

                  Sylvester BullittS 1 Reply Last reply Reply Quote 0
                  • Sylvester BullittS
                    Sylvester Bullitt @Coises
                    last edited by

                    @Coises As Peter suggested a few minutes ago, we’re now trying the approach shown at https://community.notepad-plus-plus.org/topic/22690/generic-regex-replacing-in-a-specific-zone-of-text

                    Based in the example there, we’re testing this regex:

                    (?-si:<section class="lyrics">|(?!\A)\G)(?s-i:(?!</section>).)*?(?<!^)(?<!<p>)(?<!<p class="chorus">)\K(?-si:\bWORD_TO_FIND\b)
                    

                    We also discovered a blemish in our previous version: The quote marks around the word “chorus” were typographical, and should have been standards typewriter-style quotes (").

                    We’ve been testing the regex above on live Web site files, and so far things are going well (fingers crossed as tightly as ever!).

                    1 Reply Last reply Reply Quote 1
                    • guy038G
                      guy038
                      last edited by guy038

                      Hello, @coises, @sylvester-bullitt, @peterjones and All,

                      @coises, you said in a previous post :

                      No regular expression thread is finished until @guy038 drops in to tell us that there’s a better way to do it.

                      Well, many thanks, @coises, for your kind words, but I, definitively, do not deserve this honor, because you and some other people could easily be included in this list !

                      I noticed that, given the large number of regex solutions, that most of us have been proposing for some time now, we’re getting fewer questions on this subject. To my mind, this means that the general level of N++ users, regarding the regex world, is increasing which is, globally, a good thing for a better N++ use, along with the other script solutions and their workflow !

                      I suppose, that the regex section, described in the @peterjones’s official documentation, did help some of us, too, from time to time !


                      BTW, I did not drop in this discussion, but the generic regex, suggested by @sylvester-bullitt, in its last post, seems to be the right solution

                      Best Regards,

                      guy038

                      Sylvester BullittS 1 Reply Last reply Reply Quote 1
                      • Sylvester BullittS
                        Sylvester Bullitt @guy038
                        last edited by

                        @guy038 First, let me thank you for the work you’ve done on helping develop generic regex solutions. And you’ right, the solution I mentioned yesterday, which I was testing, seemed to be very promising.

                        However, I woke up in the middle of last night, and realized that we may have overlooked a potential pitfall. As you may remember, my ultimate objective was to modify texts in song lyrics, and the generic regex on the Notepad++ site seems to be an ideal fit for my use case.

                        Though it seems to be working well so far, I’m wondering if we overlooked one thing: the the search term might be part of a hyperlink URL, and thus should not be changed. I’m running a hyperlink report on the Web site now to see if any of the links have been broken. I I don’t know the answer yet, but I should know within the next hour.

                        If the regex did indeed match/modify/break some URLs, I plan to ad a negative lookahead to exclude matches which precede .htm or an underscore. Hopefully that will be enough to prevent us from inadvertently changing links.

                        Have you run into the issue of breaking HTML links with a regex search-and-replace before?

                        Sylvester BullittS 1 Reply Last reply Reply Quote 0
                        • Sylvester BullittS
                          Sylvester Bullitt @Sylvester Bullitt
                          last edited by Sylvester Bullitt

                          @Sylvester-Bullitt Got done generating broken link report.

                          THE GOOD NEWS: My regex didn’t break any links

                          THE BAD NEWS: I just go lucky. Some further testing revealed that my regex would have broken links, if I had had the bad luck to use a search term that also in a hyperlink URL.

                          So, I added lookaheads to ignore matches of underscores and .htm, and it seems to work. In case anyone’s interested, here’s the new-and-improved regex, with some comments added for clarity:

                          (?-si:<section class="lyrics">|(?!\A)\G)(?s-i:(?!</section>).)*?(?#Not at start of line or para)(?<!^)(?<!<p>)(?<!<p class="chorus">)\K(?-si:\bWORD_TO_FIND\bq(?#Not in hyperlink)(?!(\.htm))(?!_))
                          
                          1 Reply Last reply Reply Quote 1
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors