How long before bugs are even acknowledged? 6 months & counting!

guy038

Hello, @Turtletronic, and All,

The good news is that you can close your open issue, on GitHub ;-))

As you, certainly, know, a Notepad++ search can be performed in 3 different ways :

With the Normal search mode
With the Extended search mode
With the Regular expression search mode

And it happens that the wrong behavior, that you noticed, occurs ONLY IF :

You use the Normal or Extended search mode

AND

You do NOT tick the Match case option

Indeed, when these two conditions are true, the search of SS, Ss, sS, Ss or ß matches, each, the four strings SS, Ss, sS, ss and the German character ß

But :

If the Match case option is ticked , whatever the search mode used :
- The search of SS matches the string SS, only
- The search of Ss matches the string Ss, only
- The search of sS matches the string sS, only
- The search of ss matches the string ss, only
- The search of ß matches the character ß, only

If the Match case option is NOT ticked AND the Regular expression mode is used :
- The search of SS, Ss, sS or Ss matches, each, the four strings SS, Ss, sS and ss
- The search of ß matches the character ß, only

Best Regards,

guy038

Turtletronic

Thanks for your reply,

the German lower case “EssZett” (HTML; & szlig;) is NOT freely interchangable with ss in any cap combination, so searching for one EssZett character actually resulting in the listing of other, non-search-resulted hits - just because MATCH CASE is no used - is incorrect. One could say plain buggy.

Users rely on the DEFAULT being that a search for a lower case ß will get both lower or upper case EssZett chars except MATCH CASE is set. Listing inappropriate search results (like all combos of ‘ss’) is plain WRONG. When looking for a-acute (without MATCH CASE), you would also expect to get only a-acute and A-acute as hits and not all cap combinations including a-cedille, a-grave, a-acute, etc. just because MATCH CASE was not set, right?

Your “workaround” does not fix the bad coding here, sorry.

PeterJones

If the workaround is not sufficient for you, then you have some other options:

live with it not working
find an alternative software
add more details to the issues ticket, hoping to be able to politely convince the maintainers to make the fix
make the code changes yourself (or pay someone to do it), and submit a pull request, hoping the maintainers will integrate your change

Scott Sumner

IIRC, normal search mode searching in Notepad++ is conducted solely via Scintilla’s searchInTarget function.

I say normal mode here because I believe that’s the mode the OP is using, and I don’t want to cloud the issue by possibly bringing Boost into it (for regular expressions), and extended-mode, well, ugh! I’m not even going there…

Anyway, could it be that Notepad++'s old version of Scintilla is the culprit here? I tried a version of Scintilla’s demo editor (Sc1, v.4.0.4) with a newer Scintilla version and it seemed to handle the replacement correctly (it left the ss combinations alone). If the old version of Scintilla is at fault, then there isn’t much that Notepad++ can do about it (except upgrade it!).

I’m just throwing out ideas…it would be great if someone tells me I’m wrong. :)

dinkumoil

@Turtletronic

I would like to advise you to avoid harsh wording when you request a bug fix. This is not effective as you can see by the reactions you got. I’m german too and want this issue get fixed ASAP but we have to be patient.

Scott Sumner

@dinkumoil said:

avoid harsh wording

It all seems fairly civilized to me. :-) Both the questions and the replies. I’ve seen (and done!) much worse. Oddly the thing that bothers me most about this thread is its strange title…

dinkumoil

I have some news concerning this issue.

But at first I want to clarify that the issue reported by @Turtletronic only appears if one works with an Unicode encoded file (UTF-8, UCS-2 LE/BE). This is important for reproducing the “bug” and for understanding the following too.

My investigations

I’ve greped for \<ss\> (case insensitive at word boundaries) in the whole Notepad++ source code repository (including the Scintilla tree). As a result I found the file CaseConvert.cxx which is neccessary to compile SciLexer.dll.
I had to learn how to build notepad++.exe and SciLexer.dll by myself (and was impressed how much hassle Microsoft was able to produce with its various versions of Visual Studio - I needed to install VS 2013 for compiling the Boost library and SciLexer.dll and VS 2015 to compile Notepad++ because somebody managed it to use C++ features in Npp’s source code which are not supported by the C++ compiler of VS 2013 though there is a VS 2013 project file for Npp).
I changed a conversion table I’ve found in CaseConvert.cxx. This is a mapping table for special characters (like the german ß) to their folded, uppercase and lowercase equivalents (What on earth is a “folded character”?). By changing this table I was able to prevent Notepad++ from finding ss/sS/Ss/SS when I actually searched (case insensitive) for ß. Thus it is proven that I have found the correct piece of source code.
I became aware that there is a thing called Case Folding which is used for case insensitive comparisons of strings. Case folding is a special type of lowercasing strings so that comparisons can be made in a more comprehensive way. I found helpful explanations here and here. The algorithm for case folding is part of the Unicode Standard (Section 3.13 Default Case Algorithms) and there is also a file on the FTP servers of the Unicode Consortium describing a case folding mapping (see ftp://ftp.unicode.org/Public/UNIDATA/CaseFolding.txt). It was not surprising any more that the mapping table in CaseConvert.cxx seemed to be auto-generated from this file.
The mapping table (in the Unicode Consortium document) says that the “Latin small letter sharp S” (german ß) is transformed by case folding to “ss”. For “Latin capital letter sharp S” there exist two mappings - “ß” (simple case folding) and “ss” (full case folding). The mapping table in CaseConvert.cxx uses the full case folding variant.

Conclusion

When searching case insensitive, Notepad++ transforms search strings to a form which is more suitable for case insensitive string comparison than simply lowercasing them. When doing this it follows a mapping table for special characters defined in the Unicode Standard.
Changing the behaviour of Notepad++ is only possible by changing the source code of the Scintilla project.
Because of 1. we can not expect that 2. will ever happen.

Final result

If somebody wants to search for ß he has to use case sensitive search. Since the “Latin capital letter sharp S” was adopted to official german spelling in 2017 this makes sense anyway.

chcg

@dinkumoil
Regarding building N++.
You may easy your task by adding the scintilla VS project https://github.com/notepad-plus-plus/notepad-plus-plus/blob/master/scintilla/win32/SciLexer.vcxproj and https://github.com/notepad-plus-plus/notepad-plus-plus/blob/master/scintilla/win32/packages.config, but that is still at VS2013 as an update was refused. So you have to manually update to e.g.

for the boost regex support via nuget.

See e.g. to update all to VS2015
https://github.com/notepad-plus-plus/notepad-plus-plus/pull/2464
and to build scintilla with newer boost and VS command line versions
https://github.com/notepad-plus-plus/notepad-plus-plus/pull/2336

dinkumoil

@chcg

Thank you for your advice. Meanwhile I was able to find a solution by myself. I found this guide to build Notepad++ from source. With its help I was able to compile Boost, SciLexer.dll and Notepad++ with VS 2015.

guy038

Hello, dinkumoil and All,

Your report of all your investigations was really interesting. So, as you say, we must live with that fact !

In return, you could find some interest, too, in reading these two technical reports, of the Unicode consortium, below, which can be used as a reference text :

http://www.unicode.org/reports/tr10/

http://www.unicode.org/reports/tr18/

I just have a glance to these documents, reading some parts and, like me, you’ll understand the complexity and richness of all the technical solutions envisaged to satisfy everyone, regardless of their mother tongue ;-))

By the way, I do like this sentence, given in the preamble of the link What is Unicode? :

Unicode provides a unique number for every character,

no matter what the platform,
no matter what the program,
no matter what the language.

Best Regards,

guy038