Community
    • Login

    Search++: A work in progress

    Scheduled Pinned Locked Moved Notepad++ & Plugin Development
    24 Posts 5 Posters 753 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Vitalii DovganV
      Vitalii Dovgan
      last edited by

      This is very interesting!

      Maybe you’ll also consider an alternate UI (in addition to the main one) in a form of a small one-line search panel similar to the one in Sublime Text or similar to this one:
      https://github.com/d0vgan/AkelPad-Plugs-QSearch

      To be honest, now I prefer the Incremental Search panel in Noteped++ to the Find dialog because the Incremental Search panel occupies much less space, even though it’s not that powerful as the Find dialog.
      The painful points of the Incremental Search panel are:

      • it does not clear the last word or the entire Find What field by Ctrl+Backspace. Instead, a stupid unreadable symbol is inserted. (Yes, I know, Microsoft forces us to write own handler of Ctrl+Backspace in each and every instance of an Edit control, what a shame),
      • it does not forward Ctrl+Tab to the main window, thus not allowing to switch between tabs while in the Incremental Search panel.
      CoisesC 1 Reply Last reply Reply Quote 2
      • guy038G
        guy038
        last edited by guy038

        Hello, @coises and All,

        • Regarding the Bookmarks :

        I’m pretty dumb for not thinking of the N++ native command Search > Bookmark > Clear All Bookmarks or even better : a right mouse click within the Bookmark margin with the same option !


        • Regarding the selection concept :

        Many thanks for your explanations. So, if I understand you clearly, we need to transform the selection(s) in Marked Text, first and then use the Find in Mark Text option


        • In your initial post, near the end of the Features section, you said :

        Regular expression searches in Search++ perform a fully Unicode-based search using a customized combination of Boost.Regex and ICU4C. In particular, this produces fewer “surprising” results with Unicode characters above 0xFFFF (including most emoji) and when searching in documents using a DBCS code page (which in Notepad++ can be Chinese, Japanese or Korean files that are in the system default encoding instead of in Unicode).

        Then, at the end of the Quirks and features ... section :

        The ICU button at the top is there mostly for testing. It uses the regular expression engine built into ICU, which has different syntax than the familiar Boost.Regex engine and does not integrate as well with Scintilla. Replace is not implemented for this search engine, and it only works on Unicode documents. It will probably be removed when Search++ reaches version 1.0, as it really isn’t very useful except as a check on the results from the main Regex engine (since I’ve meddled with the main Regex engine quite a lot, and I haven’t modified the ICU engine in any way).

        And later, at the end of the Missing and Planned Features ... section :

        I hope to add more features to the regular expression search. The current version is almost identical to the search in Columns++, but presented in what is hopefully a more flexible and user-friendly interface. It should be more accurate for Unicode-derived properties since it uses ICU4C directly instead of working from the home-grown parse of Unicode tables used in Columns++. If I can work out a way, I hope to add Unicode word breaks and more Unicode properties.

        So, some questions :

        • When clicking on the Regex button, do we use your Unicode search engine, as in Columns++ or is it a mix of the Columns++ version and ICU

        • Oddly, if we choose the ICU button, the Replace and Replace All buttons are not greyed and seem functional, contrary to what you said ?!

        • Can you recommend a few websites, speaking about ICU and the Unicode Word Boundaries specificity ?

        • Presently, when hitting the ICU button, do searches like \p{alphabetic} or \p[XID_Continue} are possible against my Total_Chars file of 325,590 characters ?

        TIA for all your answers !

        Best Regards,

        guy038

        CoisesC 1 Reply Last reply Reply Quote 1
        • CoisesC
          Coises @Vitalii Dovgan
          last edited by Coises

          @Vitalii-Dovgan said in Search++: A work in progress:

          Maybe you’ll also consider an alternate UI (in addition to the main one) in a form of a small one-line search panel

          Probably not one line, but reasonably compact should be possible. At present you can dock the docking Search++ dialog to the top or bottom instead of the left or right, if you like that better. The layout adapts, but it doesn’t use the full width as well as it could — right now there are only horizontal and vertical layouts, and I need to work out an “ultra-wide” layout that puts all the buttons and check boxes into a single row when the dialog is wide enough. I don’t see any reason that can’t be done, though.

          The painful points of the Incremental Search panel are:

          • it does not clear the last word or the entire Find What field by Ctrl+Backspace. Instead, a stupid unreadable symbol is inserted. (Yes, I know, Microsoft forces us to write own handler of Ctrl+Backspace in each and every instance of an Edit control, what a shame),

          Search++ does that now. (That must be the default command for Ctrl+Backspace in Scintilla, since I did nothing special to make it work. I’ve never used Ctrl+Backspace.)

          • it does not forward Ctrl+Tab to the main window, thus not allowing to switch between tabs while in the Incremental Search panel.

          I see regular Notepad++ search doesn’t do that either. (It uses Ctrl+Tab to switch dialog tabs, though, so that makes sense.) Search++ doesn’t do it now; I don’t know if it’s possible (particularly from a docked dialog) but I will see if it can be done.

          At present you can switch rapidly to the main Notepad++ window with Ctrl+N. If you’ve set a shortcut for Search++ you can then use that to switch back again. I know that’s still extra keystrokes, so I will see if Ctrl+Tab can be forwarded, since it’s not used for anything in Search++.

          Thank you for your observations and suggestions!

          Vitalii DovganV 1 Reply Last reply Reply Quote 1
          • Vitalii DovganV
            Vitalii Dovgan @Coises
            last edited by Vitalii Dovgan

            @Coises
            It should be possible to forward Ctrl+Tab and Ctrl+Shift+Tab by processing WM_KEYDOWN with VK_TAB in your dialog’s DlgProc similarly to this:
            https://github.com/d0vgan/AkelPad-Plugs-QSearch/blob/master/Source/QSearch/QSearchDlg.c#L4569

            Interestingly, the Right Ctrl key often emulates Ctrl+Alt, so when you verify only the presence of VK_TAB and VK_CONTROL (like in the code mentioned above), this code also works for RightCtrl+Tab which becomes VK_TAB and VK_CONTROL and VK_MENU. (VK_MENU is the Alt key. Unlike the real Alt key that comes under WM_SYSKEYDOWN, the “Ctrl+Alt” from RightCtrl comes under WM_KEYDOWN).

            Oh, WM_KEYUP should be handled as well:
            https://github.com/d0vgan/AkelPad-Plugs-QSearch/blob/master/Source/QSearch/QSearchDlg.c#L4607

            1 Reply Last reply Reply Quote 1
            • CoisesC
              Coises @guy038
              last edited by

              @guy038 said in Search++: A work in progress:

              So, if I understand you clearly, we need to transform the selection(s) in Marked Text, first and then use the Find in Mark Text option

              Yes; or click the Tools button, open Settings and check Convert selections to marked text before beginning a stepwise search to have Search++ do it automatically. Otherwise, multiple searches that don’t affect the selection (like Count or Find All or Replace All) will work within the selection, but only the first stepwise Find (or the preliminary find in a stepwise Replace) will be constrained to the selection, since after that the original selection will be gone.

              • When clicking on the Regex button, do we use your Unicode search engine, as in Columns++ or is it a mix of the Columns++ version and ICU

              It’s the Columns++ search engine, except for one thing. Previously I could not figure out how to incorporate ICU4C into the plugin, so for Columns++ I devised a Python program that reads several of the Unicode character data files and writes C++ code that compiles into a gigantic table containing the information I needed. I stumbled on the way to use ICU4C shortly before I began working on Search++; instead of building and using those tables, I go straight to ICU4C for information (questions like, “What is the general category of this character?” or ”Is this a lower case character?”).

              It might turn out that this will have an efficiency impact (better or worse? — I don’t know). It should fix some of the errors in Columns++, like [[:lower:]] missing characters that are lower case but not letters.

              • Oddly, if we choose the ICU button, the Replace and Replace All buttons are not greyed and seem functional, contrary to what you said ?!

              They’re not disabled, but all they do is return the message, “Command not implemented.”

              • Can you recommend a few websites, speaking about ICU and the Unicode Word Boundaries specificity ?

              I don’t really have anything except the Unicode documentation. In my brief testing, the practical effect in English is that words like can't are recognized as a single word. Most regular expression engines define a word boundary (\b) in terms of what is a word character (\w). The regular expression engine in ICU lets you do that, but it also provides an option to use Unicode word boundaries to define \b.

              • Presently, when hitting the ICU button, do searches like \p{alphabetic} or \p[XID_Continue} are possible against my Total_Chars file of 325,590 characters ?

              Yes. You can even use things like \p{script=Greek}. Unfortunately, I haven’t been able to find any place where ICU documents its own regular expression syntax. The regular-expressions.info web site includes ICU among the regex dialects it shows.

              1 Reply Last reply Reply Quote 0
              • CoisesC
                Coises @guy038
                last edited by

                @guy038 said in Search++: A work in progress:

                I’m a bit annoyed to not be able to clear this panel at any time and that I need to close and re-open a N++ session to that purpose ! Personally, an option in the Tools menu, to clear the Search++ Results panel would be great !

                Regarding the search direction :
                

                I do appreciate to temporarily reverse the search direction, with native N++ search, by hitting or releasing the Shift key ! Would it be possible to add this functionality to Search++ plugin ?

                These features, and some bug fixes, are in version 0.3.

                1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by guy038

                  Hi, @coises and All,

                  Thanks for your new Search++_03 release !

                  BTW, with native N++ search, the Shift + Enter shortcut is also available when you choose the Regular expression search mode ( with the condition that the regexBackward4PowerUser="yes" option is present within the config.xml file. May be, you could allow it as well in Search++ ?


                  I just discovered ICU’s features, and they’re really impressive ! Over the next few days, I’ll try to list the many Unicode properties accessible through ICU… Another whole new world is opening up to me !! Personally, I think the ICU button should remain available in future versions !


                  I ran into a problem while selecting characters. For example :

                  • Put this small text in new tab
                  
                  
                  ໜໝໞໟ໠໡໢໣໤໥໦໧໨໩໪໫໬໭໮໯໰໱໲໳໴໵໶໷໸໹໺໻໼໽໾໿ༀ༁༂༃༄༅༆༇༈༉༊་༌།༎༏༐༑༒༓༔༕༖༗༘༙༚༛༜༝༞༟༠༡༢༣༤༥༦༧༨༩༪༫༬༭༮༯༰༱༲༳༴༵༶༷༸༹༺༻༼༽༾༿ཀཁགགྷངཅཆཇ཈ཉཊཋཌཌྷཎཏཐདདྷནཔཕབབྷམཙཚཛཛྷཝཞཟའཡརལཤཥསཧཨཀྵཪཫཬ཭཮཯཰ཱཱཱིིུུྲྀཷླྀཹེཻོཽཾཿ྄ཱྀྀྂྃ྅྆྇ྈྉྊྋྌྍྎྏྐྑྒྒྷྔྕྖྗ྘ྙྚྛྜྜྷྞྟྠྡྡྷྣྤྥྦྦྷྨྩྪྫྫྷྭྮྯྰྱྲླྴྵྶྷྸྐྵྺྻྼ྽྾྿࿀࿁࿂࿃࿄࿅࿆࿇࿈࿉࿊࿋࿌࿍࿎࿏࿐࿑࿒࿓࿔࿕࿖࿗࿘࿙࿚࿛࿜࿝࿞࿟࿠࿡࿢࿣࿤࿥࿦࿧࿨࿩࿪࿫࿬࿭࿮࿯࿰࿱࿲࿳࿴࿵࿶࿷࿸࿹࿺࿻࿼࿽࿾࿿ကခဂဃငစဆဇဈဉညဋဌ
                  
                  
                  
                  • Switch to this new tab

                  • Run Plugins > Search++ > Search...

                  • Select the ICU button

                  • SEARCH \p{script=Tibetan}

                  • Check the Match case option

                  • Right click on the Find All button

                  • Choose the Select > Select in Whole Document option

                  => A selection appears with the bottom message Selected 207 matches

                  • Without doing anything else, I use the Ctrl + C shortcut

                  After opening an other new tab, I was quite surprised that the 207 tibetan chars were not pasted, after a Ctrl + V operation ?!

                  Then, I understood that the selection is effective ONLY IF :

                  • The Search++ plugin is closed with the x button or using the ESC key

                  • You click again on the New 1 tab, with Search++ not on focus

                  • You move the New 1 text one line Up or Down with the ▲ or ▼ marks of the vertical scroll bar

                  @coises, is this behaviour correct ?


                  Regarding the Unicode Word boundaries :

                  I had a look to https://www.regular-expressions.info/unicodeboundaries.html#word

                  I understood that :

                  • When ICU selected and the Unicode word boundaries not checked, the \b regex, against our tibetan text above, counts 46 matches

                  • When ICU selected and the Unicode word boundaries checked, the \b regex, against our tibetan text above, counts 176 matches

                  Quite different, indeed ! Note that if the Unicode word boundaries is not checked , the (?w)\b regex would also return 176 matches. Thus, a leading (?w) forces the use of the Unicode word boundaries option !

                  Then, reading https://www.regular-expressions.info/unicodeboundaries.html#grapheme, I realized that, presently, the \b regex cannot identify the different grapheme positions !

                  Would it be possible to add an option for this specific case, or am I asking too much ? I suppose the later is true !!

                  Best Regards,

                  guy038

                  CoisesC 1 Reply Last reply Reply Quote 1
                  • CoisesC
                    Coises @guy038
                    last edited by Coises

                    @guy038 said in Search++: A work in progress:

                    Thanks for your new Search++_03 release !

                    Thank you for testing it.

                    BTW, with native N++ search, the Shift + Enter shortcut is also available when you choose the Regular expression search mode ( with the condition that the regexBackward4PowerUser="yes" option is present within the config.xml file. May be, you could allow it as well in Search++ ?

                    Regex backward… I have my doubts, but I can leave it open as something I might try to make available some day. When I’ve thought about it before, I get caught up trying to define exactly what it means to match regular expressions backward. Regular expressions can match different lengths depending on where they start. Is the previous match the one that ends at the latest possible position? The one that begins at the latest possible position? The last one that would have occurred before the current position if you matched forward repeatedly from the beginning of the text? The one that would result from reversing both the text and the regular expression (but then what do you do with backreferences)?

                    Shift+Enter is a different problem. Enter doesn’t work to find: since the Find and Replace boxes take multiple lines, they consume the Enter key. You can use Alt+F and Alt+R (the underlined characters on the Find and Replace buttons), but those combinations are a bit awkward. I’ve been thinking of just making Shift+Enter and Ctrl+Enter do the functions on the Find and Replace buttons — I think those would be more natural than Alt+F and Alt+R for most people (including me). But then it isn’t obvious how access to backward should work. Beyond all that, there is no standard Windows mechanism for keyboard-only access to the drop-down menus on split command buttons. Once you can get to the button without clicking it, down arrow works to open the menu; but you can’t get there with Alt+underlined letter: that does the click action. I haven’t figured out a good way to deal with all of the keyboard navigation obstacles yet.

                    Which is a long way of saying I don’t know which of too many possibilities I will eventually decide must take priority for keyboard actions, so I don’t know what I can/will do in that regard.

                    Personally, I think the ICU button should remain available in future versions !

                    I’ll probably leave the function there… it might be “hidden” (like a Shift-click on Regex) so it doesn’t confuse people who would probably never use it.

                    • Choose the Select > Select in Whole Document option

                    => A selection appears with the bottom message Selected 207 matches

                    • Without doing anything else, I use the Ctrl + C shortcut

                    After opening an other new tab, I was quite surprised that the 207 tibetan chars were not pasted, after a Ctrl + V operation ?!

                    Then, I understood that the selection is effective ONLY IF :

                    It’s not that selection isn’t effective, it’s that keyboard focus was still in the Search++ dialog. You have to move focus to the document for the Ctrl+C to work.

                    You can use Ctrl+N (think “Notepad++”) to return focus to the document, or (as you discovered) click on the tab if you’re using the mouse.

                    This does make me think I should probably have an option, perhaps enabled by default, to return focus to the document automatically after a select operation, since wanting to copy is probably the most common reason for using select.

                    (I’ve been bitten by this often enough in Columns++, which works the same way. It’s just so easy to forget that focus is in the dialog, not the document.)

                    Note that if the Unicode word boundaries is not checked , the (?w)\b regex would also return 176 matches. Thus, a leading (?w) forces the use of the Unicode word boundaries option !

                    Hmmm… I’m not sure what’s happening there.

                    Then, reading https://www.regular-expressions.info/unicodeboundaries.html#grapheme, I realized that, presently, the \b regex cannot identify the different grapheme positions !

                    Would it be possible to add an option for this specific case

                    In both Regex and ICU, \X matches a single grapheme cluster. In Regex, (?=\X) matches a grapheme boundary; that doesn’t work in ICU. (It looks like in ICU, \X actually matches from the current position to the end of a grapheme cluster. In Regex, the match must begin and end on a grapheme cluster boundary. The Boost.Regex logic already worked that way, but I replaced/extended it to use the grapheme break algorithm specified by Unicode.) \X partially works in built-in Notepad++ search, too, but it misses some cases and falls apart entirely outside the BMP.

                    1 Reply Last reply Reply Quote 1
                    • guy038G
                      guy038
                      last edited by guy038

                      Hi, @Coises and All,

                      Don’t worry about my request for searching for regular expressions in the opposite direction: I can live without that feature !


                      Regarding the possibility of changing the Alt + F and Alt + R shortcuts to some anothers, I’m not really in favor of it because you would break a very old Windows standard !

                      BTW, when in Plain text and with focus on Search++, let’s suppose we search for the string and

                      • Using the Alt + F shorcut does move to the next match and, in addition, the shortcut Alt + Shift + F does move to the previous match Nice, indeed !

                      • But, if I decide, now, to use the Enter key, it wrongly add a line-break after the word and , in the Find dialog of Search++

                      So, to my mind, it would be better to put the focus on the Find button as soon as you hit the Alt+ F shortcut, in order to use, either, the Alt + F OR Enter and the ALT+ Shift + F OR Shift + Enter shortcuts !


                      Regarding my selection problem :

                      Oh… yes, indeed ! The solution was obvious : Ctrl + N to switch focus from the Search++ plugin to native N++ !

                      You said :

                      This does make me think I should probably have an option, perhaps enabled by default, to return focus to the document automatically after a select operation, since wanting to copy is probably the most common reason for using select.

                      Yes indeed; this would make sense !


                      Note that I personally choose the Ctrl + Shift + N shortcut for the command Plugins > Search++ > Search...

                      Thus, as a summary :

                      • When focus on Notepad++, the Ctrl + Shift + N shortcut opens or puts focus on the Search++ plugin ( User shortcut )

                      • When focus on Search++, the Ctrl + N shortcut puts focus on Notepad++ ( Search++ shortcut )

                      • When focus on Search++, the Ctrl + O shortcut toogles between the Find and the Replace dialog of **Search++( Search++ shortcut )

                      • When focus on Search++, the Ctrl + H shortcut re-opens or puts focus onthe Search++ Results`( Search++ shortcut )

                      • When focus on Search++ Results, the Ctrl + O shortcut puts focus on the Search++`plugin ( Search++ shortcut )

                      • When focus on Search++ Results, the Ctrl + Shift + N shortcut closes the Search++ Results panel and puts focus on Notepad++ ( Search++ shortcut )

                      • When focus on Search++, the Ctrl + Shift + N shortcut closes the Search++ panel and puts focus on Notepad++ ( User shortcut )

                      BTW, @coises, why didn’t you choose a single shortcut ( I’m thinking about Ctrl + H ) to toggle between the Search++ plugin and the Search++ Results ? Native notepad++ just use the F7 key to shift the focus, back and forth, between the Document window and the Search results panel !


                      Regarding the Unicode Word Boundaries :

                      When I said :

                      Note that if the Unicode word boundaries is not checked , the (?w)\b regex would also return 176 matches. Thus, a leading (?w) forces the use of the Unicode word boundaries option !

                      There’s nothing weird about this assertion. It just means that the behavior of the (?w) and (?-w) modifiers act in the same way as the well-known (?s) and (?-s) modifiers which set / unset the . matches new-line option, whatever this option is physically checked or not, in native N++ search !


                      • Regarding the Grapheme Boundaries :

                      No need to add any option ! I’ve just realized that, in regex mode, the simple regex (?!\X). does match any character which is not a Grapheme-Base char. Thus, a Count action would detect, for instance, the total number of accentuated characters associated to a simple latin letter !


                      I still need to explore all of Search++'s features and, most importantly, to compile a list of the various properties available with the ICU regular expression engine.

                      BR

                      guy038

                      CoisesC 1 Reply Last reply Reply Quote 1
                      • guy038G
                        guy038
                        last edited by

                        Hello, @coises and All,

                        Two more points :

                        • Open the change.log file

                        Let’s suppose that the N++ Find dialog is already opened and that the Find field contains the text This is a test

                        • Now, switch back to the change.log file

                        • Select the Updater (Installer only): text

                        • Use the Ctrl + F shortcut

                        => The previous text is updated to the new text to search : Updater (Installer only): => OK

                        • Now, open Search++ ( with Plugins > Search++ > Search... or with my shortcut Ctrl + Shift + N )

                        • Type in This is a test in the Find dialog of Search++

                        • Click on the change.log tab

                        • Select again the Updater (Installer only): text

                        • Put the focus again on the Search++ plugin ( with Plugins > Search++ > Search... or with my shortcut Ctrl + Shift + N )

                        => The text is not uptaded and remains the string This is a test ! To get it updated, you need to close and re-open Search++

                        Could you provide this N++ search behavior to Search++, as well ?


                        When the Search++ dialog is docked, it’s very easy to identify if focus is on Notepad++ or on Search++, thanks to the blue color of the title bar. However, this difference is not so obvious when the Search++ plugin is not docked ! Is there a mean to improve this difference ?

                        Best Regards,

                        guy038

                        1 Reply Last reply Reply Quote 1
                        • CoisesC
                          Coises @guy038
                          last edited by

                          @guy038 said in Search++: A work in progress:

                          Regarding the possibility of changing the Alt + F and Alt + R shortcuts to some anothers, I’m not really in favor of it because you would break a very old Windows standard !

                          True. I would not change them; I was thinking about adding Shift+Enter and Ctrl+Enter as alternatives.

                          • Using the Alt + F shorcut does move to the next match and, in addition, the shortcut Alt + Shift + F does move to the previous match Nice, indeed !

                          That was a “bonus.” It hadn’t occurred to me that Shift would work that way, though now that you mention it, I can see why it does.

                          • But, if I decide, now, to use the Enter key, it wrongly add a line-break after the word and , in the Find dialog of Search++

                          So, to my mind, it would be better to put the focus on the Find button as soon as you hit the Alt+ F shortcut, in order to use, either, the Alt + F OR Enter and the ALT+ Shift + F OR Shift + Enter shortcuts !

                          The Windows dialog manager doesn’t normally move focus for an Alt+ shortcut to a command button, it just does the command and leaves the focus unchanged. I could probably find a way to override that, but it’s not clear to me that I should. If you started using Alt+F, why not continue that way?

                          This does make me think I should probably have an option, perhaps enabled by default, to return focus to the document automatically after a select operation, since wanting to copy is probably the most common reason for using select.

                          Yes indeed; this would make sense !

                          It will be in the next release.

                          BTW, @coises, why didn’t you choose a single shortcut ( I’m thinking about Ctrl + H ) to toggle between the Search++ plugin and the Search++ Results ? Native notepad++ just use the F7 key to shift the focus, back and forth, between the Document window and the Search results panel !

                          I suppose I was thinking that since I can’t really make one command that handles all the focus changes — because I don’t know what the user will assign for Search++/Search in the Notepad++ shortcut mapper (if anything at all) — I would have Ctrl+N always go to the document, Ctrl+O always go to the Search dialog (though it’s Find Box unless you’re already in the Find Box, in which case it’s the Replace Box…) and Ctrl+H always go to the results (“hit list”).

                          It would make at least as much sense to have Ctrl+H toggle between the search dialog and the results list, though. Since you mention it, I’ll probably make that change.

                          There’s nothing weird about this assertion. It just means that the behavior of the (?w) and (?-w) modifiers act in the same way

                          Of course, you are correct.

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors