How long before bugs are even acknowledged? 6 months & counting!
-
Hello, @Turtletronic, and All,
The good news is that you can close your open issue, on GitHub ;-))
As you, certainly, know, a Notepad++ search can be performed in
3
different ways :-
With the
Normal
search mode -
With the
Extended
search mode -
With the
Regular expression
search mode
And it happens that the wrong behavior, that you noticed, occurs ONLY IF :
- You use the
Normal
orExtended
search mode
AND
- You do NOT tick the
Match case
option
Indeed, when these two conditions are true, the search of
SS
,Ss
,sS
,Ss
orß
matches, each, the four strings SS, Ss, sS, ss and the German character ß
But :
-
If the
Match case
option is ticked , whatever the search mode used :-
The search of
SS
matches the string SS, only -
The search of
Ss
matches the string Ss, only -
The search of
sS
matches the string sS, only -
The search of
ss
matches the string ss, only -
The search of
ß
matches the character ß, only
-
-
If the
Match case
option is NOT ticked AND theRegular expression
mode is used :-
The search of
SS
,Ss
,sS
orSs
matches, each, the four strings SS, Ss, sS and ss -
The search of
ß
matches the character ß, only
-
Best Regards,
guy038
-
-
Thanks for your reply,
the German lower case “EssZett” (HTML; & szlig;) is NOT freely interchangable with ss in any cap combination, so searching for one EssZett character actually resulting in the listing of other, non-search-resulted hits - just because MATCH CASE is no used - is incorrect. One could say plain buggy.
Users rely on the DEFAULT being that a search for a lower case ß will get both lower or upper case EssZett chars except MATCH CASE is set. Listing inappropriate search results (like all combos of ‘ss’) is plain WRONG. When looking for a-acute (without MATCH CASE), you would also expect to get only a-acute and A-acute as hits and not all cap combinations including a-cedille, a-grave, a-acute, etc. just because MATCH CASE was not set, right?
Your “workaround” does not fix the bad coding here, sorry.
-
If the workaround is not sufficient for you, then you have some other options:
- live with it not working
- find an alternative software
- add more details to the issues ticket, hoping to be able to politely convince the maintainers to make the fix
- make the code changes yourself (or pay someone to do it), and submit a pull request, hoping the maintainers will integrate your change
-
IIRC, normal search mode searching in Notepad++ is conducted solely via Scintilla’s
searchInTarget
function.I say normal mode here because I believe that’s the mode the OP is using, and I don’t want to cloud the issue by possibly bringing Boost into it (for regular expressions), and extended-mode, well, ugh! I’m not even going there…
Anyway, could it be that Notepad++'s old version of Scintilla is the culprit here? I tried a version of Scintilla’s demo editor (Sc1, v.4.0.4) with a newer Scintilla version and it seemed to handle the replacement correctly (it left the
ss
combinations alone). If the old version of Scintilla is at fault, then there isn’t much that Notepad++ can do about it (except upgrade it!).I’m just throwing out ideas…it would be great if someone tells me I’m wrong. :)
-
I would like to advise you to avoid harsh wording when you request a bug fix. This is not effective as you can see by the reactions you got. I’m german too and want this issue get fixed ASAP but we have to be patient.
-
@dinkumoil said:
avoid harsh wording
It all seems fairly civilized to me. :-) Both the questions and the replies. I’ve seen (and done!) much worse. Oddly the thing that bothers me most about this thread is its strange title…
-
I have some news concerning this issue.
But at first I want to clarify that the issue reported by @Turtletronic only appears if one works with an Unicode encoded file (UTF-8, UCS-2 LE/BE). This is important for reproducing the “bug” and for understanding the following too.
My investigations
- I’ve greped for
\<ss\>
(case insensitive at word boundaries) in the whole Notepad++ source code repository (including the Scintilla tree). As a result I found the fileCaseConvert.cxx
which is neccessary to compileSciLexer.dll
. - I had to learn how to build
notepad++.exe
andSciLexer.dll
by myself (and was impressed how much hassle Microsoft was able to produce with its various versions of Visual Studio - I needed to install VS 2013 for compiling the Boost library andSciLexer.dll
and VS 2015 to compile Notepad++ because somebody managed it to use C++ features in Npp’s source code which are not supported by the C++ compiler of VS 2013 though there is a VS 2013 project file for Npp). - I changed a conversion table I’ve found in
CaseConvert.cxx
. This is a mapping table for special characters (like the german ß) to their folded, uppercase and lowercase equivalents (What on earth is a “folded character”?). By changing this table I was able to prevent Notepad++ from finding ss/sS/Ss/SS when I actually searched (case insensitive) for ß. Thus it is proven that I have found the correct piece of source code. - I became aware that there is a thing called Case Folding which is used for case insensitive comparisons of strings. Case folding is a special type of lowercasing strings so that comparisons can be made in a more comprehensive way. I found helpful explanations here and here. The algorithm for case folding is part of the Unicode Standard (Section 3.13 Default Case Algorithms) and there is also a file on the FTP servers of the Unicode Consortium describing a case folding mapping (see ftp://ftp.unicode.org/Public/UNIDATA/CaseFolding.txt). It was not surprising any more that the mapping table in
CaseConvert.cxx
seemed to be auto-generated from this file. - The mapping table (in the Unicode Consortium document) says that the “Latin small letter sharp S” (german ß) is transformed by case folding to “ss”. For “Latin capital letter sharp S” there exist two mappings - “ß” (simple case folding) and “ss” (full case folding). The mapping table in
CaseConvert.cxx
uses the full case folding variant.
Conclusion
- When searching case insensitive, Notepad++ transforms search strings to a form which is more suitable for case insensitive string comparison than simply lowercasing them. When doing this it follows a mapping table for special characters defined in the Unicode Standard.
- Changing the behaviour of Notepad++ is only possible by changing the source code of the Scintilla project.
- Because of 1. we can not expect that 2. will ever happen.
Final result
If somebody wants to search for ß he has to use case sensitive search. Since the “Latin capital letter sharp S” was adopted to official german spelling in 2017 this makes sense anyway.
- I’ve greped for
-
@dinkumoil
Regarding building N++.
You may easy your task by adding the scintilla VS project https://github.com/notepad-plus-plus/notepad-plus-plus/blob/master/scintilla/win32/SciLexer.vcxproj and https://github.com/notepad-plus-plus/notepad-plus-plus/blob/master/scintilla/win32/packages.config, but that is still at VS2013 as an update was refused. So you have to manually update to e.g.<package id=“boost” version=“1.68.0.0” targetFramework=“native” />
<package id=“boost_regex-vc141” version=“1.68.0.0” targetFramework=“native” />for the boost regex support via nuget.
See e.g. to update all to VS2015
https://github.com/notepad-plus-plus/notepad-plus-plus/pull/2464
and to build scintilla with newer boost and VS command line versions
https://github.com/notepad-plus-plus/notepad-plus-plus/pull/2336 -
Thank you for your advice. Meanwhile I was able to find a solution by myself. I found this guide to build Notepad++ from source. With its help I was able to compile Boost, SciLexer.dll and Notepad++ with VS 2015.
-
Hello, dinkumoil and All,
Your report of all your investigations was really interesting. So, as you say, we must live with that fact !
In return, you could find some interest, too, in reading these two technical reports, of the Unicode consortium, below, which can be used as a reference text :
http://www.unicode.org/reports/tr10/
http://www.unicode.org/reports/tr18/
I just have a glance to these documents, reading some parts and, like me, you’ll understand the complexity and richness of all the technical solutions envisaged to satisfy everyone, regardless of their mother tongue ;-))
By the way, I do like this sentence, given in the preamble of the link What is Unicode? :
Unicode provides a unique number for every character,
-
no matter what the platform,
-
no matter what the program,
-
no matter what the language.
Best Regards,
guy038
-