Cannot find the fitting regular expression for counting the right class of words...



  • Hello there!

    My name’s Wolf and i’m working on my phd-thesis right now. It’s about communication in online-games, and i’m using notepad++ for counting words and wordgroups (i’m from germany, so please forgive me for my bad english).

    Right know, i 'm trying to count a special type of posting. It has got the following structure within it:

    X = any number of any characters. I tried it with “Gilde [[:alnum:]]+ sucht”, which seems to work partially, BUT i think this expression only covers the primary key base. For my search, i need the primary key base as well as every character that differs from it, e.g. A, a, À, Á, Â, Ã, Ä, Å, A, a, à, á, â, ã, ä and å (it has to work for any other character as well).

    Sadly, i have no idea how to put that into a regular expression. Could you please help me?

    Thank you so much!

    Kind greetings
    Wolf



  • Oh, and there is an additional issue: guild names are often framed by special characters like *, -, /, , <, > etc. the [x]-Part should involve those as well…i hope someone has an idea. :)



  • Hello Quangelosaurus Rex,

    Unlike you think about, the POSIX classes [[:alpha:]] and [[:alnum:]], also, match all the Unicode Latin, Greek, Cyrillic, Hebrew and Arab accentuated letters !

    So, taking in account your specific characters, including the Space character, of your second post, we, finally, get the two regexes, below :

    Gilde [[:alpha:]/<>* -]+ sucht matches the string Gilde, followed with a space, then any non-null range of Letter, Slash, Less-Than sign, Greater-Than sign, Asterisk, Space, or Hyphen-Minus sign, followed with a Space and the string sucht

    Gilde [[:alnum:]/<>* -]+ sucht matches the string Gilde, followed with a space, then any non-null range of Number, Letter, Slash, Less-Than sign, Greater-Than sign, Asterisk, Space, or Hyphen-Minus sign, followed with a Space and the string sucht

    Best Regards,

    guy038

    P.S. :

    You’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :

    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

    http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

    • The FIRST link explains the syntax, of regular expressions, in the SEARCH part

    • The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part

    You may, either, look for valuable informations, on the sites, below :

    http://www.regular-expressions.info

    http://www.rexegg.com

    http://perldoc.perl.org/perlre.html

    To end with, feel free to ask, the N++ Community, for infos on any tricky regex that you came across OR for building any tricky regex, for a particular purpose :-))



  • Sorry for the late reply. Thank you VERY much for the awesome help! You really saved me here! =) Thank you a thousand times! =)


Log in to reply