How to find and highlight a specific occurance of a symbol?
Rumi Balkhi last edited by
How to find and highlight a specific occurance of a symbol in a large text file.
Need to find the 6th occurance of =========== and highlight it.
Scott Sumner last edited by Scott Sumner
What have you tried and how isn’t it working? It might be instructive to see your thought processes in finding a solution…
Rumi Balkhi last edited by
I tried to find using following code
But its manual and it highlights all =========
I want only specific occurance. It could be the 6th or 60th or even 100th.
I have something that might work. It does require that you have the cursor at the very start of the file however (so file must be open in notepad++). Once the search has found the text, if you hit the search again it will carry on to the next multiple of the number you want, so 60th, then 120th etc.
The regex needs the ‘newline’ ticked as well.
Maybe someone could expand on this, alter it to fit better.
So if you want the 6th occurrence, use 5 in the regex (where you see 5. If 60th then use 59. It might not work it there are any sequences of 2 lines together with “=” in them. If there are not 11 “=”, then change the number in the regex to suit.
Terry R is heading in the direction I was going, but he posted first. I would slightly tweak his regex (which works) to get this:
(?s)at the start to avoid having the “newline” box be a dependency
- Remove some unnecessary group-capturing parentheses
.*at the end to prevent matching every multiple-of-six groupings (it will highlight/mark from the sixth grouping of
=thru the end-of-file…maybe this is undesirable but I think in a large file it would make it easier to find where the highlighting/marking begins!)
Note that the OP said he wanted to highlight the match, so I take this to mean using the Mark tab of the Find window for this.
Terry R said:
It might not work it there are any sequences of 2 lines together with “=” in them
It WILL work in such a case!
Here’s an explanation of the regex:
- Use these options for the whole regular expression
- Match the regular expression below
- Match any single character
- Keep the text matched so far out of the overall regex match
- Match the character string “===========” literally
- Match any single character
Created with RegexBuddy
Thanks Scott for those tweaks.
I have a question about why the
\Adidn’t work in this instance. Had it done so, then it would ALWAYS start at the start of the file, and even if run multiple times it would still select only the
nthoccurrence. As well, your
.*at the end of the regex wouldn’t then be required.
I do continue to forget the
(?i)at the start, often using the tick boxes over this approach. Of course using these allows for a standard approach to the regex (not having to change tick boxes for every expression) and allowing each regex to stand on it’s own.
I hadn’t tested the occurrence of 2 consecutive lines of '='s hence my disclaimer, that was only added into the reply as I was typing it (my testing wasn’t exhaustive).
(The day you stop learning is the day you die!)
guy038 last edited by
\Afeature is broken in the N++ implementation of the Boost Regex library. For an alternate regex engine, see the last part the the updated FAQ topic, below :
However, I found a very simple way to prevent from matching, every 6 times the
Just add a specific expression at the very beginning of the file, that does not exist elsewhere, in current file. Then, Scott, we just add, first, that expression in your regex :-)) For instance :
I added the
###mark on top of the Rumi’s text
And I used the modified regex
I added the ### mark on top of the Rumi’s text
Well…sure, but when one answers a regex question here, one sort of assumes that changing the OP’s source text to solve a problem is not allowed. :-)
Of course, doing that when solving a problem that requires “table-building”…well, then…maybe in that case we bend the rules, eh? :-D
I think I may have found another regex (building on what we already have) that will; (no matter where in the file the cursor is); ALWAYS select the nth occurence. I used some other string check that I’d provided some weeks ago to someone who originally stated the
\Adidn’t work for them. My latest rendition is:
(?<!\x0A)^should (hopefully) look for the occurence of a line starting position where there aren’t any line feed/carriage returns immediately before (actually I only test for the line feed portion). So far my tests have shown it ALWAYS selects only the 6th occurrence (in the example) and even with a second click does not change position. With putting the cursor in various positions within the file it still correctly locates the 6th position.
This negates the need to be careful where the cursor is in the file before looking for the nth occurrence and also the need to include the additional
.*at the end to grab the rest of the content, thus preventing a double click going to the nth * 2 occurrence.
\Aseems very problematic I’ve now added
(?<!\x0A)^to my arsenal!
Nice one…hopefully a downside to this is NOT found. I’m not “with” Notepad++ or RegexBuddy right now, but maybe this also works?:
(?<!\R)^Maybe not, though as lookbehinds must be of constant length and
\Rcould be of length 1 (in the case of
\n) or length 2 (for
\r\n)…would be nice if it did though because I think it is nicer on the eyes than
Sorry, not a sheriff, maybe a deputy.
I’m quite enjoying helping out (where I can) although i do need to curb my enthusiasm somewhat. And thanks to you
@guy038for your support.
And yes, Scott i agree the \R or \r\n would probably look better, I just haven’t tested that yet. As my dad always said, measure twice, cut once. So i need to test, refine, test again before presenting!
guy038 last edited by guy038
Very clever deduction, Terry. If we generalize to any kind of EOL characters, it gives
Note that I added the
\fsyntax ( Control character Form Feed,
012decimal ) because, given, for instance, the text
abcdefghij, with the Form Feed char, between the strings abcde et fghij, the regex
(?-s)^.would also match the
fletter avec the FF char. !
So, to be short, the regex
(?<!\n|\r|\f)^seems a very nice word-around to emulate the bugged
\Afeature of the N++ regex engine :-))
I used the verb seems and not the verb is because, unfortunately, there are still some problems with that syntax :-((
Let’s work on that sample text, below, that you will copy on a new tab :
Notepad++ v7.5.8 bug-fixes: This is a simple text 12345><67890 to test the \A feature
Note : For all the tests, below, the options
Wrap aroundare ticked !
- First problem :
Let’s suppose that your cursor is located between the
<characters, on the
5thline. Using the regex
(?<!\n|\r|\f)^(?-s).( which should stand for
\A(?-s).), it does find the letter
Nof Notepad++, on top of the text and any other click on the
Find Nextbutton does not find anything else. Nice !.
Now place the cursor at beginning of the
5thline, right before the
1digit, without any selection and re-run the
(?<!\n|\r|\f)^(?-s).regex. This time, the first click on the Find Next button wrongly match the
1digit. The second click finds the letter
Nas expected, and any subsequent clicks do nothing.
- Second problem :
Let apply the new regex
(?<!\n|\r|\f)^(?-s).*\R.*( which should be a work-around of
\A(?-s).*\R.*) against our sample text. The result is just identical to what I described in the point, just above. That is to say :
If cursor was between the
<characters, it matches all contents of the
1stline, with its EOL chars and all contents of the
2ndline, without its EOL chars
If cursor was at beginning of the
5thline , then :
After a first click, it matches all contents of the
5thline, with its EOL chars + all contents of the
6thline, without its EOL chars
After a second click, it matches all contents of the
1stline, with its EOL chars + all contents of the
2ndline, without its EOL chars
Now, let’s slightly change the regex, adding an
\Rsyntax, at the end of the regex, which becomes :
Now, even if we place the cursor between the
<characters, any click on the Find Next button will match, successively, two consecutive lines ( The
2nd, then, the
4thone,… and so on :-((
Just because the end of this regex matches a
Anyway, Terry, don’t be sad ! Logically, your
(?<!\n|\r|\f)^regex should work as a work-around of the
\Asyntax. It’s simply because our present regex engine does not handle backward assertions, properly, too ! I didn’t test it, yet, but I suppose the your regex should work in some regex testers, on Web :-))
And I agree, with Scott : You, certainly, are a “regex sheriff” !
So after some thinking about this, I’ve decided that I don’t think the
\Asyntax is broken in Notepad++, and I don’t think that lookbehind assertions (either positive or negative) are broken in Notepad++ either. One just has to fully understand how Notepad++ searching works. And @guy038, in your post just above, where you talk about a “first problem” and a “second problem” and beyond…I have NO problem with how Notepad++ works in these cases, given my new thinking!
Every Notepad++ search has a “starting-point” when the user initiates a search, or that Notepad++ itself initiates after a successful match in the case where one of the “find all” searches (or a Find in Files) is conducted. Each starting-point has exactly NOTHING before it. YOU may SEE data before your caret when you initiate a Find Next, but Notepad++ doesn’t. And that, IMO, isn’t necessarily a broken search feature, it is just “the way it works”. To Notepad++, each starting-point appears the same as a start-of-file does–no data (aka NOTHING) comes before it.
At the beginning of your regex, an
\Aassertion or ANY valid negative lookbehind assertion will match the NOTHING right at a starting-point (i.e., that part of the regex will always succeed). Note that a negative lookbehind assertion doesn’t match the real data to the left of your caret, it matches the NOTHING. Example: Have
12345678(only) in your buffer and your caret between the
5. Do a regex search for
(?<!4)5and see that it matches the 5. The
(?<!4)in this case allows the match because NOTHING is not
4. Some would call this regex behavior “broken”…because YOU the user can see the darned
While a search is “in-progress” on a buffer, however, no one would call negative lookbehind assertions broken. Example: Use the same buffer as before but have your caret between the
2. Do a regex search for
(?<!4)5and see that there is no match. In this case there is a
5(the search is “underway” at this point and Notepad++ is looking at the buffer…and this time the
4is a part of that buffer) so the assertion fails the overall match.
Side note: With “Wrap around” ticked during a search, if no match is found between the current caret point and the end-of-file, a second (internal) search is done by Notepad++, with a starting-point at the start-of-file. This is an additional opportunity for a match to happen at a “starting point”. For more info on the “2nd search”, see here…quite far into that thread… Although that thead discusses replacement, find a part of replacement, so it works the same way.
So that brings us back to the regex under discussion:
It will work like an “unbroken”
\Ain most cases. A possible problem with it comes when doing “multiple” searches with it (or when the user-starting-point isn’t the start-of-file). In those cases, if the previous action with it leaves the next starting-point at the start of any line, the next search will match right there – not only at start-of-file – which may not be what the user desires. (Again, it will match that case because the
(?<!\r|\n)part will match the NOTHING and the
The best way to use it as an
\A-equivalent is to have the end of your regex not leave the next starting-point at any start-of-line.
So let’s go back to Terry’s original regex (or very close to it):
When run on the OP’s sample data (duplicated a few times so that there are many more than 9 lines of
===========data) this will find exactly ONE match when a “find all” is done. That is because the end of the regex sets up the follow-on starting-point to NOT be able to match a start-of-line (needed by the
Changing the regex slightly at the end:
This one will result in multiple matches using a “find all” in the OP’s (extended) data, for reasons which should now be apparent. Thus it is worth pointing out in such a case that the
(?<!\r|\n)^regex is NOT what one normally thinks
\Ashould be doing. So while it can be a
\Asubstitute, it still has to be used with some amount of caution, and of course, understanding. :-)
\A: You can be the judge of whether or not it is broken: The Boost documentation for
\Asays “Matches at the start of a buffer only” – does one consider the “buffer” to be the entirety of the Notepad++ editor tab data, or the starting-point(s) through a later point for a search? Your call. :-)
And now onto
\G. In this thread , there are 3 conditions specified for where the
\Gassertion can match. I believe there is only ONE place it can match; hint: the “starting-point” of a search. :-)
This post is deleted!