Community
    • Login

    Search for accented words.

    Scheduled Pinned Locked Moved General Discussion
    searchaccented
    7 Posts 3 Posters 4.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • SoCuS
      SoCu
      last edited by

      How to search for a text in accented words.

      For example, if I have the word “Comparación”, and in the search I type “Comparacion”, how can I make it show me all the words whether they are accented or not.

      I have the option “Regular expression” checked, but it does not show it.

      Thank you.

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @SoCu
        last edited by

        @socu ,

        Search for Comparaci[[=o=]]n, in regular expression mode, to find either Comparación or Comparacion

        It’s called the equivalence class.

        So if you wanted to search for the accented versinon of any of the vowewls or n in that word for some reason, it would be C[[=o=]]mp[[=a=]]r[[=a=]]c[[=i=]][[=o=]][[=n=]]

        1 Reply Last reply Reply Quote 3
        • guy038G
          guy038
          last edited by guy038

          Hello, @socu and All,

          First, here are two regexes which help you to see where you are almost sure to get some accentuated characters :

          • In a Unicode encoded file ( so in all encoding options but ANSI ) :

            • Open the Mark dialog ( Ctrl + M )

            • SEARCH (?-i)[\x{00C0}-\x{024F}]

            • Untick all options

            • Tick the Purge for each search option

            • Tick the Wrap around option

            • Select the Regular expression searh mode

            • Click on the Mark All button

          • In an ANSI encoded file :

            • Open the Mark dialog ( Ctrl + M )

            • SEARCH (?i)[\x8A\x8E\x9A\x9E\xC0-\xFF]

            • Untick all options

            • Tick the Purge for each search option

            • Tick the Wrap around option

            • Select the Regular expression searh mode

            • Click on the Mark All button


          As developped by @peterjones, the general method to find any vowel , accentuated or not, is to use the regex class equivalence syntax, below :

          [[=vowel=]]. Of course, you must replace the string vowel by the exact single vowel, accentuated or not, to search for !

          Now, this may be difficult to achieve when you want to find any form, from a specific word !

          So, here is a work-around which enables you to search for any form of a specific word :


          • Select the specific word, which may contain one or several accentuated characters

          • Open the Replace dialog ( Ctrl + H )

          • Wipe out the SEARCH field

          • SEARCH (?i)([aeiouy])|\w

          • REPLACE ?1[[=$0=]]:$0

          • Untick all options

          • Tick the Wrap around option

          • Tick the In selection option ( IMPORTANT )

          • Select the Regular expression search mode

          • Click once on the Replace All button ( Do not use the `Replace button )

          => A new string should be selected

          • Hit the Esc key to close the Replace dialog

          • Open the Mark dialog ( Ctrl+ M )

          => The string, previously selected, should be automatically written in the SEARCH field

          • ( SEARCH C[[=o=]]mp[[=a=]]r[[=a=]]c[[=i=]][[=o=]]n )

          • Untick all options

          • If preferred, tick the Bookmark line option

          • Tick the Purge for each search option

          • Tick the Wrap around option

          • Select the Regular expression searh mode

          • Click on the Mark All button

          => This regex should find any comparacion word, whatever its case and whatever if accentuated characters exist in vowels or not, throughout the entire file !

          For instance, it would mark all the strings, below, based on the root comparacion :

          comparacion
          cÒmparación
          CompàraciÔn
          cömparÅciõn
          Compâraciøn
          

          Best Regards,

          guy038

          SoCuS 1 Reply Last reply Reply Quote 3
          • SoCuS
            SoCu @guy038
            last edited by

            Thanks, I thought that these searches would be easier to perform, the truth is that it is not practical to have to put so many characters [[=x=]] for each vowel in the word, it can be a waste of time.

            Maybe it is something that needs to be changed, you could think about it for future updates, to be able to perform this type of searches so as not to fill the words with so many characters.

            PeterJonesP 1 Reply Last reply Reply Quote 1
            • PeterJonesP
              PeterJones @SoCu
              last edited by

              @socu said in Search for accented words.:

              Thanks, I thought that these searches would be easier to perform, the truth is that it is not practical to have to put so many characters [[=x=]] for each vowel in the word, it can be a waste of time.

              Maybe it is something that needs to be changed, you could think about it for future updates, to be able to perform this type of searches so as not to fill the words with so many characters.

              That is standard behavior in every regular expression engine that I have ever used in my 25+ years of using regular expression engines – if you want to match a single literal character, you type that literal character; if you want to match something more complicated (like a list of potential characters, predefined or not), then you have to use special syntax to invoke that mode. The Notepad++ application uses a pre-built regular expression engine, and doesn’t write their own, because the developers wanted to focus on the interesting things, not designing yet another regular expression engine from the ground up. So even if this Forum were the feature request tracker (and it’s not, as explained in “Please Read This Before Posting” and “Feature Request and Bug Report”), I would bet that the Developers would not implement such a request – moreover, I would lobby against such a change, because it would break decades of expectation that when you say “search for o, that it searches for the literal character o, and not o, plus some accented o-like characters.”

              SoCuS 1 Reply Last reply Reply Quote 1
              • SoCuS
                SoCu @PeterJones
                last edited by

                I understand, it is clear that I am not very knowledgeable, I thought that you could add to the search engine some exceptions such as accented characters so that it does not take them into account when performing a search.

                Thank you.

                PeterJonesP 1 Reply Last reply Reply Quote 1
                • PeterJonesP
                  PeterJones @SoCu
                  last edited by PeterJones

                  @socu ,

                  It would make sense if there were an “accent-insensitive” flag in the standard regex engines, just like there’s “case-insensitive” flag. But no regex engine that I’ve ever used has had such a flag… Given that some of those engines have decades of development (for example, the Boost regex engine used by Notepad++ was derived from the PCRE engine, which had its roots in late-90s Perl regular expression), most of which has included knowing about Unicode, and the number of times I’ve seen “is there an accent-insensitive flag for regex-flavor-X” questions answered in the negative in programming forums, I would assume that if it were technically reasonable to be included, it would have been developed and included in the major ones by now. Given that it hasn’t been developed, I am assuming that’s because there’s a huge technical roadblock that’s beyond my pay grade to understand.

                  1 Reply Last reply Reply Quote 2
                  • PeterJonesP PeterJones referenced this topic on
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors