Community
    • Login

    Find-in-FIles: Can’t Replace Multiple Instances of Word

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    regex html
    48 Posts 5 Posters 4.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Sylvester BullittS
      Sylvester Bullitt
      last edited by Sylvester Bullitt

      I’m writing a regular expression to replace words in an HTML file. To match
      the regex, the word to replace must:

      1. Be some where after the occurrence of the string “lyrics-text” (I’m editing song lyrics).

      2. Must not be at the start of a line, and not immediately after <p> or <p class=“chorus”>.

      My regex:

      FIND

      lyrics-text.+?(?<!^)(?<!<p>)(?<!<p class="chorus">)\Kword_to_find(?=(.+?</div>))
      

      REPLACE WITH:

      replacement_word
      

      When I manually run the regex against a single file using the Find dialog, Replace tab, I check the boxes Match case, Wrap around & . matches newline, and the replacement works as expected. If there are multiple instances of word_to_find in the file, it replaces them one at a time as I click Replace.

      However, when I try the replacement using the Find in Files tab and the Replace in Files button, it replaces only the first instance of the word_to_find in each file. Subsequent instances of word_to_find in the same file are ignored.

      I don’t see any options on the Find in Files tab regarding multiple matches. What am I missing?

      I can provide a sample file if needed, but was trying to keep the question as simple as possible.

      CoisesC 1 Reply Last reply Reply Quote 0
      • CoisesC
        Coises @Sylvester Bullitt
        last edited by

        @Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:

        I don’t see any options on the Find in Files tab regarding multiple matches. What am I missing?

        You’re missing that the expression you’ve given should only match once for each occurrence of lyrics-text. I think that if you use Replace All instead of clicking Replace multiple times, you’ll see it only matches once in the active file (unless lyrics-text occurs more than once). It’s only because of Wrap around that the individual Replaces keep matching. Wrap around has no effect on multiple matches; matching starts at the beginning and moves forward. Once the expression matches lyrics-text, there’s no way for it to go backward and match that same text again.

        You can resolve this by clicking Replace in Files repeatedly until nothing more is replaced. I don’t think there is a way to do it in a single step.

        Note: I’m surprised Replace (as opposed to Replace All) works at all. Usually when \K is involved, individual replaces fail (for a technical reason connected to how Notepad++ identifies matches for replacement).

        1 Reply Last reply Reply Quote 1
        • Sylvester BullittS
          Sylvester Bullitt
          last edited by

          I probably didn’t explain the use case very well.

          I actually DO want the regex to match multiple times if there are multiple instances of the search time within the lyrics block.

          And, when I do it manually, it does exactly what I want, as long as I keep clicking the replace button.

          The failing behavior occurs when I’m searching multiple files (in my case, could be over 15,000). It finds the correct files, but only replace one instance of the search term in each file.

          After studying this some more, I wonder if the issue is that the cursor position changes after each match? If so, when the search for the next match starts, the cursor is past the lyrics-text string, and so any subsequent instances of word_to_find don’t match, because the regex engine doesn’t realize the instances really are after word_to_find (I’m on thin ice here, just guessing).

          Would it help if I explicitly stated that the search should start from the top of the file? Is that even possible? Would it look something like this?

          \A.+?lyrics-text.+?(?<!^)(?<!<p>)(?<!<p class="chorus">)\Kword_to_find(?=(.+?</div>))
          
          CoisesC 1 Reply Last reply Reply Quote 0
          • CoisesC
            Coises @Sylvester Bullitt
            last edited by

            @Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:

            After studying this some more, I wonder if the issue is that the cursor position changes after each match? If so, when the search for the next match starts, the cursor is past the lyrics-text string, and so any subsequent instances of word_to_find don’t match, because the regex engine doesn’t realize the instances really are after word_to_find (I’m on thin ice here, just guessing).

            That is correct.

            Would it help if I explicitly stated that the search should start from the top of the file? Is that even possible? Would it look something like this?

            It is possible, and your suggestion would do that, but it wouldn’t help. The first match would start from the beginning of the file; then it would be past the beginning of the file, so it wouldn’t match any more.

            For a circumstance where there can be over 15,000 files, and presumably an unknown number of matches in each file, I have to suggest that you’d be better with a scripting tool than a text editor, if you are familiar with any scripting language.

            Sylvester BullittS 1 Reply Last reply Reply Quote 1
            • Sylvester BullittS
              Sylvester Bullitt @Coises
              last edited by

              @Coises Thanks the quick reply.

              I’m wondering if a small interface change to NPP would help solve this use case.

              The manual search-and-replace for a single file has a “wraparound” option. Is there some reason this functionality couldn’t be added to the search-and-replace in multiple files?

              Regarding a scripting language, I am familiar with PowerShell (posh). Could you give me some insight on how you think posh might be used with NPP?

              Alan KilbornA PeterJonesP CoisesC 4 Replies Last reply Reply Quote 0
              • Alan KilbornA
                Alan Kilborn @Sylvester Bullitt
                last edited by

                @Sylvester-Bullitt said:

                The manual search-and-replace for a single file has a “wraparound” option. Is there some reason this functionality couldn’t be added to the search-and-replace in multiple files?

                Yes, there is: Because it doesn’t make sense.
                Let’s think about how Replace in Files works:

                • N++ opens the file into a tab (that isn’t shown to the user)
                • N++ searches the tab data from the start of the first line to the end of the last line, replacing matches as it goes
                • N++ closes the unseen tab

                Where is the opportunity for “wrap around” in the above scenario?


                Understand also that Wrap around is ONLY considered when doing an interactive search operation (Find Next or Replace). All other search types (e.g. Replace All, …) do NOT “wrap” and the Wrap around control (when used) means “Entire document, from top to bottom”.

                Sylvester BullittS 1 Reply Last reply Reply Quote 0
                • Alan KilbornA
                  Alan Kilborn @Sylvester Bullitt
                  last edited by

                  @Coises said:

                  where there can be over 15,000 files, and presumably an unknown number of matches in each file, I have to suggest that you’d be better with a scripting tool than a text editor, if you are familiar with any scripting language.

                  @Sylvester-Bullitt said :

                  Regarding a scripting language, I am familiar with PowerShell (posh). Could you give me some insight on how you think posh might be used with NPP?

                  In the context of what @Coises said about scripting, there’s a couple of types:

                  • scripting with a N++ scripting plugin, using data open in a N++ tab
                  • scripting outside of N++, acting on files in the file system (i.e., really, nothing to do with N++)

                  Given that you have a feel for PowerShell scripting, this would have to be “outside of N++”, and thus detailed discussion of that here is off-topic, and thus disallowed. However, I would not object if @Coises (or anyone else) has some really general tips for you here, in a limited discussion.

                  1 Reply Last reply Reply Quote 0
                  • Sylvester BullittS
                    Sylvester Bullitt @Alan Kilborn
                    last edited by Sylvester Bullitt

                    @Alan-Kilborn Seems to me a simple modification to NPP’s logic for a multi-file search-and-replace might handle it:

                    loop_for_each_file {
                       Perform the search-and-replace on the file.
                       If search found zero matches {
                           Exit the loop, and proceed to next file.
                       } 
                      Else return to top of loop & repeat the search-and-replace
                    } end_loop
                    

                    The secret sauce is: Don’t exit the loop until zero matches are found in the file. That ensures that all instances of the search term have been found and replaced.

                    And yes, there is a performance hit. But (1) it only occurs if there are multiple matches in the file and (2) the hit is minuscule compared to the work involved in making all the extra search-and-replaces manually. In my use case, I’ll have to perform manual search-and-replaces on about 500 files, which is a non-starter.

                    Another option would be to simply use the same search-and-replace logic that is invoked by the Replace All button on the single-file-search-and-replace dialog.

                    Alan KilbornA 2 Replies Last reply Reply Quote 0
                    • Alan KilbornA
                      Alan Kilborn @Sylvester Bullitt
                      last edited by Alan Kilborn

                      @Sylvester-Bullitt said:

                      a simple modification to NPP’s logic for a multi-file search-and-replace might handle it
                      Don’t exit the loop until zero matches are found in the file. That ensures that all instances of the search term have been found and replaced

                      You can try making an official feature request for it if you’d like; see HERE for tips on doing so.

                      The idea has some downsides, that may limit its appeal to developers:

                      • it isn’t a frequent need
                      • it wouldn’t be used by many users
                      • it could introduce “infinite loop” potential that would need to be addressed (consider simple and unrealistic – but possible – case of replacing a with a)

                      There may be more downsides (haven’t had enough coffee yet this morning to fully consider), but even the last point above is enough such that calling it “a simple modification” is an oversimplification…

                      Sylvester BullittS 1 Reply Last reply Reply Quote 0
                      • Alan KilbornA
                        Alan Kilborn @Sylvester Bullitt
                        last edited by

                        @Sylvester-Bullitt

                        If you’d be receptive to PythonScript scripting within Notepad++, I could give you some tips for your specific need, but I won’t bother unless you have some enthusiasm for it.

                        1 Reply Last reply Reply Quote 0
                        • Sylvester BullittS
                          Sylvester Bullitt @Alan Kilborn
                          last edited by

                          @Alan-Kilborn Those downsides are easily avoided:

                          1. Since only few users might want that functionality, use the current behavior as the default. Use the new behavior as a selectable option (e.g., a check box).

                          2. To avoid the infinite loop: Compare the search string with the replacement string before beginning the operation. If they’re identical, show an error message to the user and and do nothing else.

                          Alan KilbornA 1 Reply Last reply Reply Quote 0
                          • Alan KilbornA
                            Alan Kilborn @Sylvester Bullitt
                            last edited by Alan Kilborn

                            @Sylvester-Bullitt said:


                            use the current behavior as the default. Use the new behavior as a selectable option (e.g., a check box).

                            Well, that much is obvious, but it does nothing for any deeper problems.
                            Of course, if you actually make a true feature request, you should go a step further and provide some suggested captioning for the checkbox.


                            Those downsides are easily avoided

                            …before beginning the operation. If they’re identical…

                            Sorry, no. In your naïveté, you can’t see it. Like a lot of users, you oversimplify a situation…and think software is “easy” (you’ve stated that twice now, tsk tsk).

                            But it is all moot anyway, unless you make a feature request as a first step.

                            Cheers, and good luck.

                            1 Reply Last reply Reply Quote 0
                            • PeterJonesP
                              PeterJones @Sylvester Bullitt
                              last edited by

                              @Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:

                              Is there some reason this functionality couldn’t be added to the search-and-replace in multiple files?

                              Your regex doesn’t do all the replacements in a single file when you do Replace All once, and you have to hit Replace or Replace All multiple times to get it to work, even with your “wraparound” option in a single file. Why would you expect the Find in Files version to behave any differently across multiple files? The problem is not with Notepad++'s user interface not having enough options, but in your understanding of the regex involved, and the way that “wrap around” works.

                              Regarding “wrap around”: <PrincessBride character="Inigo">I do not think that word means what you think it means</PricessBride>. It’s a holdover from the terminology present in Microsoft Windows’ “notepad.exe” search dialog for time immemorial: they called it “wrap around”, because in normal searches, the search starts where your cursor is, but if it reaches the end of the file, it “wraps around” to the beginning once.

                              In the Notepad++ implementation, it follows that behavior for a standard search or replace. But with Search All or Replace All, as described here, it makes exactly one pass through the entire document, starting at the beginning (regardless of where your caret is) and going through the very end.

                              This “exactly one pass” is why your original regex with Replace All in one file didn’t replace all instances unless you hit it multiple times, and it’s exactly the same reason it doesn’t Replace All in Files when you hit Replace All once.

                              You designed a regex that “consumes” too much of the data, so it cannot do it all in one – unless you put it into an infinite loop mode (which I think is not the right idea, nor do I think Notepad++ should have to implement an “infinite loop mode”). The regex syntax has a \G assertion, which means “match at the end of previous match”, which can be used in a regex alternation to good effect (see, for example, it’s use in the FAQ: Generic Regex => Replacing in a Zone, where once you enter a zone, \G allows you to continue in the same zone, even though the regex cannot “see” the beginning of the zone any more.

                              A simple example text to show what \G allows in an alternation:

                              line prefix: word another word final word
                              word another word final word
                              line prefix: word another word final word
                              

                              The regex FIND=^line prefix:.*?\Kword REPLACE=newText only replaces the first word on the line prefix: lines, so you have to hit Replace All three times to do them all.

                              But if you change it to FIND=(^line prefix:|\G).*?\Kword, then the same action only takes one Replace All.

                              Thus, if your original regex was working for doing a single replacement that had to be run multiple times:

                              lyrics-text.+?(?<!^)(?<!<p>)(?<!<p class="chorus">)\Kword_to_find(?=(.+?</div>))
                              

                              … then my guess is that this slight change, putting an alternation group before the \K with a \G as the second choice in the alternative section, would work (but I don’t have the time to test it out thoroughly for you):

                              (?:lyrics-text|\G).+?(?<!^)(?<!<p>)(?<!<p class="chorus">)\Kword_to_find(?=(.+?</div>))
                              

                              ----

                              Useful References

                              • Template for Search/Replace Questions
                              • Notepad++ Online User Manual: Searching/Regex
                              • FAQ: Where to find other regular expressions (regex) documentation
                              Sylvester BullittS 2 Replies Last reply Reply Quote 1
                              • Sylvester BullittS
                                Sylvester Bullitt @PeterJones
                                last edited by

                                @PeterJones Interesting. I’ve never had an occasion to use /g before. Maybe now’s the time for me to learn something new about the mysteries of regex.

                                At any rate, I probably won’t be able to look at it today. I’ll try to start digging into your suggestions tomorrow. Thanks for taking the time to give such detailed feedback!

                                PeterJonesP 1 Reply Last reply Reply Quote 0
                                • PeterJonesP
                                  PeterJones @Sylvester Bullitt
                                  last edited by PeterJones

                                  @Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:

                                  /g

                                  Exact characters and capitalization are important in regex. /g means "the literal / character followed by the literal lower-case g character; whereas I used \G , which is the Continuation Escape found in the Anchors section of the regex documentation , and holds special meaning to regex. Using the /g instead of \G will not give the same results.

                                  1 Reply Last reply Reply Quote 1
                                  • CoisesC
                                    Coises @Sylvester Bullitt
                                    last edited by

                                    @Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:

                                    Regarding a scripting language, I am familiar with PowerShell (posh). Could you give me some insight on how you think posh might be used with NPP?

                                    I would not attempt to use PowerShell with, but rather instead of Notepad++.

                                    I’m afraid I haven’t yet learned PowerShell (despite using it anyway), but you probably want to start by reviewing:

                                    Get-ChildItem
                                    Get-Content
                                    New-Item
                                    about Regular Expressions
                                    -match and -replace operators

                                    Establish a folder where you’ll put your results.
                                    Get the collection of files you want to examine.
                                    For each file, get the content.
                                    Determine if the content requires modification.
                                    Iterate through the content making the needed modifications.
                                    Copy the file if it’s unchanged, write a new file if it is changed.

                                    Sylvester BullittS 1 Reply Last reply Reply Quote 2
                                    • Sylvester BullittS
                                      Sylvester Bullitt @Coises
                                      last edited by

                                      @Coises Got it. Thanks!

                                      1 Reply Last reply Reply Quote 0
                                      • Sylvester BullittS
                                        Sylvester Bullitt @PeterJones
                                        last edited by Sylvester Bullitt

                                        @PeterJones I tried the regex modification you suggested

                                        (?:lyrics-text|\G).+?(?<!^)(?<!<p>)(?<!<p class="chorus">)\Kstar(?=(.+?</div>))
                                        

                                        To keep it simple, I ran it against a single file in the editor (rather than multiple files on disk). The file text that was in the editor:

                                        <!DOCTYPE HTML>
                                        <html lang="en-us">
                                        
                                        <head>
                                        <meta charset="utf-8">
                                        <title>Twinkle, Twinkle, Little Star</title>
                                        <meta name="description" content="Words: Jane Taylor, 1806. Music: ___, ___">
                                        <meta name="keywords" content="Jane Taylor">
                                        <link rel="stylesheet" href="../../css/hymn.css">
                                        <script src="../../js/jquery.js"></script>
                                        <script src="../../js/base.js"></script>
                                        <script src="../../js/hymn.js"></script>
                                        <link rel="prev" href="../../htm/h/e/w/o/hewonsav.htm">
                                        <link rel="next" href="../../htm/h/e/s/a/hesallwo.htm">
                                        <link rel="up" href="../../ttl/ttl-h.htm">
                                        </head>
                                        
                                        <body>
                                        
                                        <section id="preface">
                                        <h1 class="screen-reader-only">Introduction</h1>
                                        <div class="preface-text">
                                        <p><span class="lead">Words:</span> <a href="../../bio/t/a/y/l/taylor_jane.htm">Jane Tay­lor</a>, 1806.</p>
                                        <p><span class="lead">Music:</span> John Doe  (<a href="../../mid/d/u/m/m/dummy.mid" title="Listen to music, MIDI format">🔊</a> <a href="../../pdf/en/d/u/m/m/Dummy.pdf" title="Download score, PDF format">pdf</a> <a href="../../nwc/d/u/m/m/Dummy.nwc" title="Download score, Noteworthy Composer format">nwc</a>).</p>
                                        </div>
                                        </section>
                                        
                                        <p>This page is used to test glo­bal search-and-replace us­ing re­gu­lar ex­pressions.</p>
                                        
                                        <section class="lyrics">
                                        <h1 class="screen-reader-only">Lyrics</h1>
                                        <div class="stanzas"><div class="lyrics-text mc ll">
                                        <p>Twinkle, twinkle, little star,<br>
                                        How I wonder what you are!<br>
                                        Up above the world so high,<br>
                                        Like a diamond in the sky.</p>
                                        <p>When the blazing sun is gone,<br>
                                        When he nothing shines upon,<br>
                                        Then you show your little light,<br>
                                        Twinkle, twinkle, all the night.</p>
                                        <p>Then the trav’ller in the dark,<br>
                                        Thanks you for your tiny spark,<br>
                                        He could not see which way to go,<br>
                                        If you did not twinkle so.</p>
                                        <p>In the dark blue sky you keep,<br>
                                        And often thro’ my curtains peep,<br>
                                        For you never shut your eye,<br>
                                        Till the sun is in the sky.</p>
                                        <p>’Tis your bright and tiny spark,<br>
                                        Lights the trav’ller in the dark:<br>
                                        Tho’ I know not what you are,<br>
                                        Twinkle, twinkle, little star.</p>
                                        </div></div>
                                        </section>
                                        
                                        </body>
                                        </html>
                                        

                                        I put the cursor at the beginning of the file, clicked the Find Next button. and got this error.

                                        If I click the Replace All button instead, a different message appears: Replace All: 0 occurrences were replaced in entire file.

                                        PeterJonesP 2 Replies Last reply Reply Quote 0
                                        • PeterJonesP
                                          PeterJones @Sylvester Bullitt
                                          last edited by

                                          @Sylvester-Bullitt said in Find-in-FIles: Can’t Replace Multiple Instances of Word:

                                          The file text that was in the editor:

                                          I tried that text with your original regex from your first post, and got the same result.

                                          As I said, “if your original regex was working for doing a single replacement that had to be run multiple times”. Your text didn’t match your regex even once, thus the “if” condition was not met, and you should not expect my modification to work.

                                          Sylvester BullittS PeterJonesP 2 Replies Last reply Reply Quote 0
                                          • Sylvester BullittS
                                            Sylvester Bullitt @PeterJones
                                            last edited by

                                            @PeterJones Oops. Mea culpa.

                                            Forgot to change the search mode to Regular Expression. After doing that, and clicking Replace All, all 3 occurrences were replaced in one fell swoop. Sorry for the confusion.

                                            My next step is to try it on multiple files on the hard disk.

                                            Whew!

                                            CoisesC 1 Reply Last reply Reply Quote 1
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors