How to find and highlight a specific occurance of a symbol?
-
How to find and highlight a specific occurance of a symbol in a large text file.
Example:
Text
TextText
Text
TextText
TextText
TextText
Text
Text
Text
Need to find the 6th occurance of =========== and highlight it.
Thanks
-
What have you tried and how isn’t it working? It might be instructive to see your thought processes in finding a solution…
-
I tried to find using following code
(?-s).(\R).\R.*=========But its manual and it highlights all =========
I want only specific occurance. It could be the 6th or 60th or even 100th.
-
I have something that might work. It does require that you have the cursor at the very start of the file however (so file must be open in notepad++). Once the search has found the text, if you hit the search again it will carry on to the next multiple of the number you want, so 60th, then 120th etc.
The regex needs the ‘newline’ ticked as well.
Expression is
Find:((.+?)===========){5}(.+?)\K(===========)
Maybe someone could expand on this, alter it to fit better.
So if you want the 6th occurrence, use 5 in the regex (where you see 5. If 60th then use 59. It might not work it there are any sequences of 2 lines together with “=” in them. If there are not 11 “=”, then change the number in the regex to suit.
Terry
-
Terry R is heading in the direction I was going, but he posted first. I would slightly tweak his regex (which works) to get this:
(?s)(?:.+?===========){5}.+?\K===========.*
The tweaks:
- Use
(?s)
at the start to avoid having the “newline” box be a dependency - Remove some unnecessary group-capturing parentheses
- Add
.*
at the end to prevent matching every multiple-of-six groupings (it will highlight/mark from the sixth grouping of=
thru the end-of-file…maybe this is undesirable but I think in a large file it would make it easier to find where the highlighting/marking begins!)
Note that the OP said he wanted to highlight the match, so I take this to mean using the Mark tab of the Find window for this.
Terry R said:
It might not work it there are any sequences of 2 lines together with “=” in them
It WILL work in such a case!
Here’s an explanation of the regex:
- Use these options for the whole regular expression
(?s)
- Match the regular expression below
(?:.+?===========){5}
- Match any single character
.+?
- Keep the text matched so far out of the overall regex match
\K
- Match the character string “===========” literally
===========
- Match any single character
.*
Created with RegexBuddy
- Use
-
Thanks Scott for those tweaks.
I have a question about why the
\A
didn’t work in this instance. Had it done so, then it would ALWAYS start at the start of the file, and even if run multiple times it would still select only thenth
occurrence. As well, your.*
at the end of the regex wouldn’t then be required.I do continue to forget the
(?s)
and(?i)
at the start, often using the tick boxes over this approach. Of course using these allows for a standard approach to the regex (not having to change tick boxes for every expression) and allowing each regex to stand on it’s own.I hadn’t tested the occurrence of 2 consecutive lines of '='s hence my disclaimer, that was only added into the reply as I was typing it (my testing wasn’t exhaustive).
Terry
(The day you stop learning is the day you die!)
-
Hi, @rumi-balkhi, @scott-sumner, @terry-r, and All,
Terry, The
\A
feature is broken in the N++ implementation of the Boost Regex library. For an alternate regex engine, see the last part the the updated FAQ topic, below :https://notepad-plus-plus.org/community/topic/15765/faq-desk-where-to-find-regex-documentation
However, I found a very simple way to prevent from matching, every 6 times the
===========
line !Just add a specific expression at the very beginning of the file, that does not exist elsewhere, in current file. Then, Scott, we just add, first, that expression in your regex :-)) For instance :
-
I added the
###
mark on top of the Rumi’s text -
And I used the modified regex
(?s)###(?:.+?===========){5}.+?\K===========
Voilà !
Best Regards,
guy038
-
-
@guy038 said:
I added the ### mark on top of the Rumi’s text
Well…sure, but when one answers a regex question here, one sort of assumes that changing the OP’s source text to solve a problem is not allowed. :-)
Of course, doing that when solving a problem that requires “table-building”…well, then…maybe in that case we bend the rules, eh? :-D
-
I think I may have found another regex (building on what we already have) that will; (no matter where in the file the cursor is); ALWAYS select the nth occurence. I used some other string check that I’d provided some weeks ago to someone who originally stated the
\A
didn’t work for them. My latest rendition is:
Find:(?s)(?<!\x0A)^((.+?)===========){5}(.+?)\K(===========)
So the
(?<!\x0A)^
should (hopefully) look for the occurence of a line starting position where there aren’t any line feed/carriage returns immediately before (actually I only test for the line feed portion). So far my tests have shown it ALWAYS selects only the 6th occurrence (in the example) and even with a second click does not change position. With putting the cursor in various positions within the file it still correctly locates the 6th position.This negates the need to be careful where the cursor is in the file before looking for the nth occurrence and also the need to include the additional
.*
at the end to grab the rest of the content, thus preventing a double click going to the nth * 2 occurrence.Since the
\A
seems very problematic I’ve now added(?<!\x0A)^
to my arsenal!Terry
-
Nice one…hopefully a downside to this is NOT found. I’m not “with” Notepad++ or RegexBuddy right now, but maybe this also works?:
(?<!\R)^
Maybe not, though as lookbehinds must be of constant length and\R
could be of length 1 (in the case of\r
or\n
) or length 2 (for\r\n
)…would be nice if it did though because I think it is nicer on the eyes than\x0A
. :-)@guy038, I think there’s a(nother) regex sheriff in town, and his name is @Terry-R…
-
Sorry, not a sheriff, maybe a deputy.
I’m quite enjoying helping out (where I can) although i do need to curb my enthusiasm somewhat. And thanks to you
Scott Summer
and@guy038
for your support.And yes, Scott i agree the \R or \r\n would probably look better, I just haven’t tested that yet. As my dad always said, measure twice, cut once. So i need to test, refine, test again before presenting!
Terry
-
Hello @terry-r, @scott-sumner, @rumi-balkhi and All,
Very clever deduction, Terry. If we generalize to any kind of EOL characters, it gives
(?<!\n|\r|\f)^
.Note that I added the
\f
syntax ( Control character Form Feed,\x0C
or012
decimal ) because, given, for instance, the textabcdefghij
, with the Form Feed char, between the strings abcde et fghij, the regex(?-s)^.
would also match thef
letter avec the FF char. !So, to be short, the regex
(?<!\n|\r|\f)^
seems a very nice word-around to emulate the bugged\A
feature of the N++ regex engine :-))I used the verb seems and not the verb is because, unfortunately, there are still some problems with that syntax :-((
Let’s work on that sample text, below, that you will copy on a new tab :
Notepad++ v7.5.8 bug-fixes: This is a simple text 12345><67890 to test the \A feature
Note : For all the tests, below, the options
Regular expression
andWrap around
are ticked !- First problem :
Let’s suppose that your cursor is located between the
>
and<
characters, on the5th
line. Using the regex(?<!\n|\r|\f)^(?-s).
( which should stand for\A(?-s).
), it does find the letterN
of Notepad++, on top of the text and any other click on theFind Next
button does not find anything else. Nice !.Now place the cursor at beginning of the
5th
line, right before the1
digit, without any selection and re-run the(?<!\n|\r|\f)^(?-s).
regex. This time, the first click on the Find Next button wrongly match the1
digit. The second click finds the letterN
as expected, and any subsequent clicks do nothing.- Second problem :
Let apply the new regex
(?<!\n|\r|\f)^(?-s).*\R.*
( which should be a work-around of\A(?-s).*\R.*
) against our sample text. The result is just identical to what I described in the point, just above. That is to say :-
If cursor was between the
>
and<
characters, it matches all contents of the1st
line, with its EOL chars and all contents of the2nd
line, without its EOL chars -
If cursor was at beginning of the
5th
line , then :-
After a first click, it matches all contents of the
5th
line, with its EOL chars + all contents of the6th
line, without its EOL chars -
After a second click, it matches all contents of the
1st
line, with its EOL chars + all contents of the2nd
line, without its EOL chars
-
Now, let’s slightly change the regex, adding an
\R
syntax, at the end of the regex, which becomes :(?<!\n|\r|\f)^(?-s).*\R.*\R
Now, even if we place the cursor between the
>
and<
characters, any click on the Find Next button will match, successively, two consecutive lines ( The1st
and the2nd
, then, the3rd
and the4th
one,… and so on :-((Just because the end of this regex matches a
\r
or\n
character !
Anyway, Terry, don’t be sad ! Logically, your
(?<!\n|\r|\f)^
regex should work as a work-around of the\A
syntax. It’s simply because our present regex engine does not handle backward assertions, properly, too ! I didn’t test it, yet, but I suppose the your regex should work in some regex testers, on Web :-))And I agree, with Scott : You, certainly, are a “regex sheriff” !
Cheers,
guy038
-
So after some thinking about this, I’ve decided that I don’t think the
\A
syntax is broken in Notepad++, and I don’t think that lookbehind assertions (either positive or negative) are broken in Notepad++ either. One just has to fully understand how Notepad++ searching works. And @guy038, in your post just above, where you talk about a “first problem” and a “second problem” and beyond…I have NO problem with how Notepad++ works in these cases, given my new thinking!Every Notepad++ search has a “starting-point” when the user initiates a search, or that Notepad++ itself initiates after a successful match in the case where one of the “find all” searches (or a Find in Files) is conducted. Each starting-point has exactly NOTHING before it. YOU may SEE data before your caret when you initiate a Find Next, but Notepad++ doesn’t. And that, IMO, isn’t necessarily a broken search feature, it is just “the way it works”. To Notepad++, each starting-point appears the same as a start-of-file does–no data (aka NOTHING) comes before it.
At the beginning of your regex, an
\A
assertion or ANY valid negative lookbehind assertion will match the NOTHING right at a starting-point (i.e., that part of the regex will always succeed). Note that a negative lookbehind assertion doesn’t match the real data to the left of your caret, it matches the NOTHING. Example: Have12345678
(only) in your buffer and your caret between the4
and the5
. Do a regex search for(?<!4)5
and see that it matches the 5. The(?<!4)
in this case allows the match because NOTHING is not4
. Some would call this regex behavior “broken”…because YOU the user can see the darned4
right there!While a search is “in-progress” on a buffer, however, no one would call negative lookbehind assertions broken. Example: Use the same buffer as before but have your caret between the
1
and the2
. Do a regex search for(?<!4)5
and see that there is no match. In this case there is a4
before the5
(the search is “underway” at this point and Notepad++ is looking at the buffer…and this time the4
is a part of that buffer) so the assertion fails the overall match.Side note: With “Wrap around” ticked during a search, if no match is found between the current caret point and the end-of-file, a second (internal) search is done by Notepad++, with a starting-point at the start-of-file. This is an additional opportunity for a match to happen at a “starting point”. For more info on the “2nd search”, see here…quite far into that thread… Although that thead discusses replacement, find a part of replacement, so it works the same way.
So that brings us back to the regex under discussion:
(?<!\r|\n)^
.It will work like an “unbroken”
\A
in most cases. A possible problem with it comes when doing “multiple” searches with it (or when the user-starting-point isn’t the start-of-file). In those cases, if the previous action with it leaves the next starting-point at the start of any line, the next search will match right there – not only at start-of-file – which may not be what the user desires. (Again, it will match that case because the(?<!\r|\n)
part will match the NOTHING and the^
will hit).The best way to use it as an
\A
-equivalent is to have the end of your regex not leave the next starting-point at any start-of-line.So let’s go back to Terry’s original regex (or very close to it):
(?s)(?<!\r|\n)^((.+?)===========){5}(.+?)\K(===========)
When run on the OP’s sample data (duplicated a few times so that there are many more than 9 lines of
===========
data) this will find exactly ONE match when a “find all” is done. That is because the end of the regex sets up the follow-on starting-point to NOT be able to match a start-of-line (needed by the^
assertion).Changing the regex slightly at the end:
(?s)(?<!\r|\n)^((.+?)===========){5}(.+?)\K(===========)\R
This one will result in multiple matches using a “find all” in the OP’s (extended) data, for reasons which should now be apparent. Thus it is worth pointing out in such a case that the
(?<!\r|\n)^
regex is NOT what one normally thinks\A
should be doing. So while it can be a\A
substitute, it still has to be used with some amount of caution, and of course, understanding. :-)Back to
\A
: You can be the judge of whether or not it is broken: The Boost documentation for\A
says “Matches at the start of a buffer only” – does one consider the “buffer” to be the entirety of the Notepad++ editor tab data, or the starting-point(s) through a later point for a search? Your call. :-)And now onto
\G
. In this thread , there are 3 conditions specified for where the\G
assertion can match. I believe there is only ONE place it can match; hint: the “starting-point” of a search. :-) -
This post is deleted!