Bugs in normal search for v7.2.2 on windows 7
-
There are bugs in Notepad++ searching for text, it does not find all occurrences.
I’m using v7.2.2 on windows 7.
for example, with the text:0220 LET FF2$=STBL("SORTD")+"ScrL"+HHRTE$+"."+DATE(JUL_DATE:"%Yz%Mz%Dz"),ERR=221 ,--0221 ERASE FF2$,ERR=0222 `->0222 MKEYED FF2$,[1:1:37],[1:3:7]+[1:1:37],0,300 0224 LET FILE42=UNT; OPEN (FILE42)FF2$ 0229 REM
normal mode searching for text FF2$ only finds the line ‘OPEN (FILE42)FF2$’
Also seems that searching for strings that begin or end with white space fails.
example, with text:| | | 6219 IF LATE$=LO0$(12,7)
| | | THEN
| | | ,-- GOTO 6240
| | | | 6220 GOSUB 15500
| | | | 6221 LET LATE$=LO0$(12,7)
| | | `->6240 DIM TMP$(37),LOKEY$(20),LOFLAG$(1);
| | ^ LET LOQTY=0,LODOL=0
| v | 6241 LET TMP$(1,2)=HHRTE$(1,2);
| | | REM “Route Number”searches for ’ GOSUB ‘, ’ GOSUB’, or 'GOSUB ’ fail to find anything, while search for GOSUB does.
-
I don’t have that behavior - npp (vesion 7.2.2) finds all of the FF2$ instances and
GOSUB with/without spaces before and/or after.
What find options do you use?
If wrap around is not set, make sure the caret is on the first position of the document.Cheers
Claudia -
Your mention of “whitespace” makes me think of a tabs versus space issue – you may have tab characters in your document, but when you type into the “find what” box you likely use a space; this will result in NO MATCH. It will also not match if you have it the other way around (spaces in document, tab characters in “find what” box), but this is far less likely, because to get a real tab character in the “find what” box you have to go to special lengths.
-
Hello, Harold Worby52,
I fully understood what happened ! I bet that you checked, by mistake ( or on purpose ! ) the Match whole word only option
So, just uncheck the Match whole word only option and everything should be OK :-))
To explain the behaviour of the search, when this option is ON, I need to speak, first, about word and NON-word characters
As you may know, a word character is either :
-
Any letter, accentuated or not, in upper or lower case + some few Unicode characters, which are considered as letter-like characters
-
Any digit from
0
to9
+ some few Unicode characters, which are considered as number-like characters -
The Underscore character,
_
, also called Low Line, with Unicode code-point\x{005F}
So, a NON-word is, simply, any single character, which does NOT belong to the three above categories !
These are the official definitions of a word character (
\w
) and of a NON-word character (\W
), when the Regular expression search mode is used
Now, when you use the Normal search mode AND that the Match whole word only option is enabled this behaviour is a bit different :
-
If the FIRST character, of the text, in the Find what zone, belongs to the strict range
[0-9A-Za-z_]
OR belongs to the range[\x{0080}-\x{ffff}]
, then, a match is found, if the PREVIOUS character is, either :-
Any C0 Control character, in the range
[\x00-\x1F]
-
A character, with Unicode code-point <
\x80
, which does NOT belong to the strict range[0-9A-Za-z_]
-
-
If the FIRST character, of the text, in the Find what zone, does NOT belong to the strict range
[0-9A-Za-z_]
AND does NOT belong to the range[\x{0080}-\x{ffff}]
, too, then, a match is found, if the PREVIOUS character is, either :-
Any C0 Control character, in the range
[\x00-\x1F]
-
A word character, in the strict range
[0-9A-Za-z_]
-
A character, from the range
[\x{0080}-\x{ffff}]
-
And, in the same way :
-
If the LAST character, of the text, in the Find what zone, belongs to the strict range
[0-9A-Za-z_]
OR belongs to the range[\x{0080}-\x{ffff}]
, then, a match is found, if the NEXT character is, either :-
Any C0 Control character, in the range
[\x00-\x1F]
-
A character, with Unicode code-point <
\x80
, which does NOT belong to the strict range[0-9A-Za-z_]
-
-
If the LAST character, of the text, in the Find what zone, does NOT belong to the strict range
[0-9A-Za-z_]
AND does NOT belong to the range[\x{0080}-\x{ffff}]
, too, then, a match is found, if the NEXT character is, either :-
Any C0 Control character, in the range
[\x00-\x1F]
-
A word character, in the strict range
[0-9A-Za-z_]
-
A character, from the range
[\x{0080}-\x{ffff}]
-
So, Harold, if you insert the string FF2$ in the Find what zone,( with a letter as first character and a symbol as last character ) you’ll get a match ONLY IF, the both conditions, below, are true :
-
The PREVIOUS character must be, either :
-
Any C0 Control character, in the range
[\x00-\x1F]
-
A character, with Unicode code-point <
\x80
, which does NOT belong to the strict range[0-9A-Za-z_]
-
-
The NEXT character must be, either :
-
Any C0 Control character, in the range
[\x00-\x1F]
-
A word character, in the strict range
[0-9A-Za-z_]
-
A character, from the range
[\x{0080}-\x{ffff}]
-
Given your original text, below :
0220 LET FF2$=STBL("SORTD")+"ScrL"+HHRTE$+"."+DATE(JUL_DATE:"%Yz%Mz%Dz"),ERR=221 ,--0221 ERASE FF2$,ERR=0222 `->0222 MKEYED FF2$,[1:1:37],[1:3:7]+[1:1:37],0,300 0224 LET FILE42=UNT; OPEN (FILE42)FF2$ 0229 REM
There are four occurrences of the string FF2$
FF2$= FF2$, FF2$, )FF2$
It easy to see that :
-
The conditions, about the PREVIOUS character, are correct : The space or the ending parenthese do not belongs to the
[0-9A-Za-z_]
-
The conditions, about the NEXT character, are OK, only in the last case, Indeed, the
=
sign and the comma does NOT belong to the[0-9A-Za-z_]
. But, in the last case, the string FF2$ is followed by the EOL character,\n
, which does belong to the range[\x00-\x1F]
. So a match of FF2$ is found !
As you essily notice, the gestion of the Match whole word only option, is quite complicated. So, I would advice you to enable this option,
ONLY IF
the text, in the Find what: field, begins and ends with a true word character, that is to say if you search for a classical word !!Best Regards,
guy038
-
-
@guy038 said:
[0-9A-Za-a-z_]
You used the above many times (probably via copy and paste) in your great post explaining the “whole words” option to the “Find” dialog – I had no idea of the depth of it, but had in the past noticed strange behavior when using it with non-“word” characters… – but I think you actually meant the following:
[0-9A-Za-z_]
-
Hi, Scott,
Many thanks for detecting my mistake ! Ah, I carried on my error, through all the contents of my post :-((
So, simple lesson : beware about the real pasted text !! BTW, I’ve already updated my previous post
Scott, you’re, really an attentive reader, aren’t you ? Brrrr ! I ought to be careful when I write future posts ;-)
But, Scott, if I did an obvious grammar fault or if I was totally wrong about an English expression or word, don’t hesitate to tell me about it ! It will improve “my” English, which is far from being fluent !
Cheers,
guy038