Bugs in normal search for v7.2.2 on windows 7

Harold Worby52

There are bugs in Notepad++ searching for text, it does not find all occurrences.
I’m using v7.2.2 on windows 7.
for example, with the text:

                 0220 LET FF2$=STBL("SORTD")+"ScrL"+HHRTE$+"."+DATE(JUL_DATE:"%Yz%Mz%Dz"),ERR=221
              ,--0221 ERASE FF2$,ERR=0222
              `->0222 MKEYED FF2$,[1:1:37],[1:3:7]+[1:1:37],0,300
                 0224 LET FILE42=UNT;
                      OPEN (FILE42)FF2$
                 0229 REM

normal mode searching for text FF2$ only finds the line ‘OPEN (FILE42)FF2$’

Also seems that searching for strings that begin or end with white space fails.
example, with text:

searches for ’ GOSUB ‘, ’ GOSUB’, or 'GOSUB ’ fail to find anything, while search for GOSUB does.

Claudia Frank

@Harold-Worby52

I don’t have that behavior - npp (vesion 7.2.2) finds all of the FF2$ instances and
GOSUB with/without spaces before and/or after.
What find options do you use?
If wrap around is not set, make sure the caret is on the first position of the document.

Cheers
Claudia

Scott Sumner

@Harold-Worby52

Your mention of “whitespace” makes me think of a tabs versus space issue – you may have tab characters in your document, but when you type into the “find what” box you likely use a space; this will result in NO MATCH. It will also not match if you have it the other way around (spaces in document, tab characters in “find what” box), but this is far less likely, because to get a real tab character in the “find what” box you have to go to special lengths.

guy038

Hello, Harold Worby52,

I fully understood what happened ! I bet that you checked, by mistake ( or on purpose ! ) the Match whole word only option

So, just uncheck the Match whole word only option and everything should be OK :-))

To explain the behaviour of the search, when this option is ON, I need to speak, first, about word and NON-word characters

As you may know, a word character is either :

Any letter, accentuated or not, in upper or lower case + some few Unicode characters, which are considered as letter-like characters
Any digit from 0 to 9 + some few Unicode characters, which are considered as number-like characters
The Underscore character, _, also called Low Line, with Unicode code-point \x{005F}

So, a NON-word is, simply, any single character, which does NOT belong to the three above categories !

These are the official definitions of a word character ( \w ) and of a NON-word character ( \W ), when the Regular expression search mode is used

Now, when you use the Normal search mode AND that the Match whole word only option is enabled this behaviour is a bit different :

If the FIRST character, of the text, in the Find what zone, belongs to the strict range [0-9A-Za-z_] OR belongs to the range [\x{0080}-\x{ffff}] , then, a match is found, if the PREVIOUS character is, either :
- Any C0 Control character, in the range [\x00-\x1F]
- A character, with Unicode code-point < \x80, which does NOT belong to the strict range [0-9A-Za-z_]
If the FIRST character, of the text, in the Find what zone, does NOT belong to the strict range [0-9A-Za-z_] AND does NOT belong to the range [\x{0080}-\x{ffff}], too, then, a match is found, if the PREVIOUS character is, either :
- Any C0 Control character, in the range [\x00-\x1F]
- A word character, in the strict range [0-9A-Za-z_]
- A character, from the range [\x{0080}-\x{ffff}]

And, in the same way :

If the LAST character, of the text, in the Find what zone, belongs to the strict range [0-9A-Za-z_] OR belongs to the range [\x{0080}-\x{ffff}] , then, a match is found, if the NEXT character is, either :
- Any C0 Control character, in the range [\x00-\x1F]
- A character, with Unicode code-point < \x80, which does NOT belong to the strict range [0-9A-Za-z_]
If the LAST character, of the text, in the Find what zone, does NOT belong to the strict range [0-9A-Za-z_] AND does NOT belong to the range [\x{0080}-\x{ffff}], too, then, a match is found, if the NEXT character is, either :
- Any C0 Control character, in the range [\x00-\x1F]
- A word character, in the strict range [0-9A-Za-z_]
- A character, from the range [\x{0080}-\x{ffff}]

So, Harold, if you insert the string FF2$ in the Find what zone,( with a letter as first character and a symbol as last character ) you’ll get a match ONLY IF, the both conditions, below, are true :

The PREVIOUS character must be, either :
- Any C0 Control character, in the range [\x00-\x1F]
- A character, with Unicode code-point < \x80, which does NOT belong to the strict range [0-9A-Za-z_]
The NEXT character must be, either :
- Any C0 Control character, in the range [\x00-\x1F]
- A word character, in the strict range [0-9A-Za-z_]
- A character, from the range [\x{0080}-\x{ffff}]

Given your original text, below :

   0220 LET FF2$=STBL("SORTD")+"ScrL"+HHRTE$+"."+DATE(JUL_DATE:"%Yz%Mz%Dz"),ERR=221
,--0221 ERASE FF2$,ERR=0222
`->0222 MKEYED FF2$,[1:1:37],[1:3:7]+[1:1:37],0,300
   0224 LET FILE42=UNT;
        OPEN (FILE42)FF2$
  0229 REM

There are four occurrences of the string FF2$

 FF2$=
 FF2$,
 FF2$,
)FF2$

It easy to see that :

The conditions, about the PREVIOUS character, are correct : The space or the ending parenthese do not belongs to the [0-9A-Za-z_]
The conditions, about the NEXT character, are OK, only in the last case, Indeed, the = sign and the comma does NOT belong to the [0-9A-Za-z_]. But, in the last case, the string FF2$ is followed by the EOL character, \n, which does belong to the range [\x00-\x1F]. So a match of FF2$ is found !

As you essily notice, the gestion of the Match whole word only option, is quite complicated. So, I would advice you to enable this option,ONLY IF the text, in the Find what: field, begins and ends with a true word character, that is to say if you search for a classical word !!

Best Regards,

guy038

Scott Sumner

@guy038 said:

[0-9A-Za-a-z_]

You used the above many times (probably via copy and paste) in your great post explaining the “whole words” option to the “Find” dialog – I had no idea of the depth of it, but had in the past noticed strange behavior when using it with non-“word” characters… – but I think you actually meant the following:

[0-9A-Za-z_]

guy038

Hi, Scott,

Many thanks for detecting my mistake ! Ah, I carried on my error, through all the contents of my post :-((

So, simple lesson : beware about the real pasted text !! BTW, I’ve already updated my previous post

Scott, you’re, really an attentive reader, aren’t you ? Brrrr ! I ought to be careful when I write future posts ;-)

But, Scott, if I did an obvious grammar fault or if I was totally wrong about an English expression or word, don’t hesitate to tell me about it ! It will improve “my” English, which is far from being fluent !

Cheers,

guy038