Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Search for accented words.

    General Discussion
    search accented
    3
    7
    671
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • SoCu
      SoCu last edited by

      How to search for a text in accented words.

      For example, if I have the word “Comparación”, and in the search I type “Comparacion”, how can I make it show me all the words whether they are accented or not.

      I have the option “Regular expression” checked, but it does not show it.

      Thank you.

      PeterJones 1 Reply Last reply Reply Quote 0
      • PeterJones
        PeterJones @SoCu last edited by

        @socu ,

        Search for Comparaci[[=o=]]n, in regular expression mode, to find either Comparación or Comparacion

        It’s called the equivalence class.

        So if you wanted to search for the accented versinon of any of the vowewls or n in that word for some reason, it would be C[[=o=]]mp[[=a=]]r[[=a=]]c[[=i=]][[=o=]][[=n=]]

        1 Reply Last reply Reply Quote 3
        • guy038
          guy038 last edited by guy038

          Hello, @socu and All,

          First, here are two regexes which help you to see where you are almost sure to get some accentuated characters :

          • In a Unicode encoded file ( so in all encoding options but ANSI ) :

            • Open the Mark dialog ( Ctrl + M )

            • SEARCH (?-i)[\x{00C0}-\x{024F}]

            • Untick all options

            • Tick the Purge for each search option

            • Tick the Wrap around option

            • Select the Regular expression searh mode

            • Click on the Mark All button

          • In an ANSI encoded file :

            • Open the Mark dialog ( Ctrl + M )

            • SEARCH (?i)[\x8A\x8E\x9A\x9E\xC0-\xFF]

            • Untick all options

            • Tick the Purge for each search option

            • Tick the Wrap around option

            • Select the Regular expression searh mode

            • Click on the Mark All button


          As developped by @peterjones, the general method to find any vowel , accentuated or not, is to use the regex class equivalence syntax, below :

          [[=vowel=]]. Of course, you must replace the string vowel by the exact single vowel, accentuated or not, to search for !

          Now, this may be difficult to achieve when you want to find any form, from a specific word !

          So, here is a work-around which enables you to search for any form of a specific word :


          • Select the specific word, which may contain one or several accentuated characters

          • Open the Replace dialog ( Ctrl + H )

          • Wipe out the SEARCH field

          • SEARCH (?i)([aeiouy])|\w

          • REPLACE ?1[[=$0=]]:$0

          • Untick all options

          • Tick the Wrap around option

          • Tick the In selection option ( IMPORTANT )

          • Select the Regular expression search mode

          • Click once on the Replace All button ( Do not use the `Replace button )

          => A new string should be selected

          • Hit the Esc key to close the Replace dialog

          • Open the Mark dialog ( Ctrl+ M )

          => The string, previously selected, should be automatically written in the SEARCH field

          • ( SEARCH C[[=o=]]mp[[=a=]]r[[=a=]]c[[=i=]][[=o=]]n )

          • Untick all options

          • If preferred, tick the Bookmark line option

          • Tick the Purge for each search option

          • Tick the Wrap around option

          • Select the Regular expression searh mode

          • Click on the Mark All button

          => This regex should find any comparacion word, whatever its case and whatever if accentuated characters exist in vowels or not, throughout the entire file !

          For instance, it would mark all the strings, below, based on the root comparacion :

          comparacion
          cÒmparación
          CompàraciÔn
          cömparÅciõn
          Compâraciøn
          

          Best Regards,

          guy038

          SoCu 1 Reply Last reply Reply Quote 3
          • SoCu
            SoCu @guy038 last edited by

            Thanks, I thought that these searches would be easier to perform, the truth is that it is not practical to have to put so many characters [[=x=]] for each vowel in the word, it can be a waste of time.

            Maybe it is something that needs to be changed, you could think about it for future updates, to be able to perform this type of searches so as not to fill the words with so many characters.

            PeterJones 1 Reply Last reply Reply Quote 1
            • PeterJones
              PeterJones @SoCu last edited by

              @socu said in Search for accented words.:

              Thanks, I thought that these searches would be easier to perform, the truth is that it is not practical to have to put so many characters [[=x=]] for each vowel in the word, it can be a waste of time.

              Maybe it is something that needs to be changed, you could think about it for future updates, to be able to perform this type of searches so as not to fill the words with so many characters.

              That is standard behavior in every regular expression engine that I have ever used in my 25+ years of using regular expression engines – if you want to match a single literal character, you type that literal character; if you want to match something more complicated (like a list of potential characters, predefined or not), then you have to use special syntax to invoke that mode. The Notepad++ application uses a pre-built regular expression engine, and doesn’t write their own, because the developers wanted to focus on the interesting things, not designing yet another regular expression engine from the ground up. So even if this Forum were the feature request tracker (and it’s not, as explained in “Please Read This Before Posting” and “Feature Request and Bug Report”), I would bet that the Developers would not implement such a request – moreover, I would lobby against such a change, because it would break decades of expectation that when you say “search for o, that it searches for the literal character o, and not o, plus some accented o-like characters.”

              SoCu 1 Reply Last reply Reply Quote 1
              • SoCu
                SoCu @PeterJones last edited by

                I understand, it is clear that I am not very knowledgeable, I thought that you could add to the search engine some exceptions such as accented characters so that it does not take them into account when performing a search.

                Thank you.

                PeterJones 1 Reply Last reply Reply Quote 1
                • PeterJones
                  PeterJones @SoCu last edited by PeterJones

                  @socu ,

                  It would make sense if there were an “accent-insensitive” flag in the standard regex engines, just like there’s “case-insensitive” flag. But no regex engine that I’ve ever used has had such a flag… Given that some of those engines have decades of development (for example, the Boost regex engine used by Notepad++ was derived from the PCRE engine, which had its roots in late-90s Perl regular expression), most of which has included knowing about Unicode, and the number of times I’ve seen “is there an accent-insensitive flag for regex-flavor-X” questions answered in the negative in programming forums, I would assume that if it were technically reasonable to be included, it would have been developed and included in the major ones by now. Given that it hasn’t been developed, I am assuming that’s because there’s a huge technical roadblock that’s beyond my pay grade to understand.

                  1 Reply Last reply Reply Quote 2
                  • Referenced by  PeterJones PeterJones 
                  • First post
                    Last post
                  Copyright © 2014 NodeBB Forums | Contributors