Community
    • Login

    Using sets to find A-Za-z plus the # and - chars ..?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    23 Posts 6 Posters 1.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • IanSunlunI
      IanSunlun @PeterJones
      last edited by

      @PeterJones Ah, thats seems to work thanks.
      Does [\w%#.~-]+ put whatever it matches into ${1} ?

      PeterJonesP 2 Replies Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @IanSunlun
        last edited by PeterJones

        This post is deleted!
        1 Reply Last reply Reply Quote 0
        • PeterJonesP
          PeterJones @José Luis Montero Castellanos
          last edited by

          This post is deleted!
          1 Reply Last reply Reply Quote 0
          • PeterJonesP
            PeterJones @IanSunlun
            last edited by PeterJones

            @IanSunlun said in Using sets to find A-Za-z plus the # and - chars ..?:

            Does [\w%#.~-]+ put whatever it matches into ${1} ?

            Sorry, when I answered, I had forgotten that you previously said,

            (So I need to store pagename in ${1} and bookmark in ${2}.)

            Putting the # into either match is not what you want, either. You really need two groups, one before the # and one after.

            FIND = http://mysitename.net/index.php/([\w%.~-]+)#([\w%.~-]+)"
            will only match if there is a bookmark, and the # will not be inside the ${2} group. If you want the # to be included in ${2}, use http://mysitename.net/index.php/([\w%.~-]+)(#[\w%.~-]+)"

            IanSunlunI 1 Reply Last reply Reply Quote 2
            • IanSunlunI
              IanSunlun @PeterJones
              last edited by IanSunlun

              @PeterJones said in Using sets to find A-Za-z plus the # and - chars ..?:

              FIND = http://mysitename.net/index.php/([\w%.~-]+)#([\w%.~-]+)"

              With the period . inbetween the % and the ~ it did not find:
              http://mysitename.net/index.php/New_Video#column-one"
              But taking the period out, it did find it.
              Whats the thinking behind the period in this context ?

              PeterJonesP 1 Reply Last reply Reply Quote 0
              • PeterJonesP
                PeterJones @IanSunlun
                last edited by PeterJones

                @IanSunlun ,

                Except for -, order doesn’t matter inside the [] character class. The period is there because New.Video#column-one is also a valid URL ender end-string.

                FIND = http://mysitename.net/index.php/([\w%.~-]+)#([\w%.~-]+)"
                does match http://mysitename.net/index.php/New_Video#column-one":

                2fb36c05-cd1f-406d-92f6-ec71aec5bb2a-image.png

                Alan KilbornA 1 Reply Last reply Reply Quote 2
                • Alan KilbornA
                  Alan Kilborn @PeterJones
                  last edited by

                  @PeterJones said in Using sets to find A-Za-z plus the # and - chars ..?:

                  FIND = http://mysitename.net/index.php/([\w%.~-]+)#([\w%.~-]+)"

                  Is it worth pointing out that the first two periods here really aren’t periods but rather “match any char”, because they aren’t escaped? Sure, an unescaped . will match a literal period, but it will match other things as well (obviously).

                  IMO, OP here needs to stop asking forum questions and go off and study regex.

                  1 Reply Last reply Reply Quote 1
                  • guy038G
                    guy038
                    last edited by guy038

                    Hello, @peterjones,

                    In the post below, Peter :

                    https://community.notepad-plus-plus.org/post/81643

                    You said :

                    Actually, it’s not documented in our character classes section. I will remedy that.

                    Then, regarding the Character Class feature, may be, this part could be added to the Official Notepad++ Documentation : :

                    If we consider the following CHARACTER CLASS structure :
                    
                    [.......]
                    123456789
                    
                    The POSSIBLE location(s), in order to find the LITERAL character below, are :
                    
                    LITERAL Character [    :     POSSIBLE at any position, BETWEEN 2 to 8 
                                                 POSSIBLE at any position, BETWEEN 2 to 8, if PRECEDED with an ANTI-SLASH character
                    							 
                    LITERAL Character ]    :     POSSIBLE at position 2 ONLY
                                                 POSSIBLE at any position, BETWEEN 2 to 8, if PRECEDED with an ANTI-SLASH character
                    							 
                    LITERAL Character -    :     POSSIBLE at position 2
                                                 POSSIBLE at position 8
                                                 POSSIBLE at any position, BETWEEN 2 to 8, if PRECEDED with an ANTI-SLASH character
                    							 
                    LITERAL Character \    :     POSSIBLE at any position, BETWEEN 2 to 8, if PRECEDED with an ANTI-SLASH character
                    

                    Of course, change this layout as you like !

                    Best Regards,

                    guy038

                    Alan KilbornA 1 Reply Last reply Reply Quote 2
                    • Alan KilbornA
                      Alan Kilborn @guy038
                      last edited by Alan Kilborn

                      @guy038

                      It is rather awkward to express, but I like your idea.

                      My idea for expression:

                      • To use a “literal [” in a character class: Use it directly like any other character, e.g. [ab[c]; “escaping” is not necessary (but is permissible), e.g. [ab\\[c]

                      • To use a “literal ]” in a character class: Directly right after the opening [ of the class notation, e.g. []abc], OR “escaped” at any position, e.g. [\\]abc] or [a\\]bc]

                      • To use a “literal -” in a character class: Directly as the first or last character in the enclosing class notation, e.g. [-abc] or [abc-], OR “escaped” at any position, e.g. [\-abc] or [a\-bc]

                      • To use a “literal \” in a character class: Must be doubled (i.e., \\) inside the enclosing class notation, e.g. [ab\\c]

                      PeterJonesP 1 Reply Last reply Reply Quote 2
                      • PeterJonesP
                        PeterJones @Alan Kilborn
                        last edited by

                        @Alan-Kilborn & @guy038 ,

                        I like those suggestions, especially the way Alan rephrased it: it works much better than my clunky first attempt in the manual, that only included - and was not not very readable.

                        Thanks.

                        Alan KilbornA 1 Reply Last reply Reply Quote 2
                        • Alan KilbornA
                          Alan Kilborn @PeterJones
                          last edited by Alan Kilborn

                          @PeterJones

                          Maybe my first-of-4 bullet points previously should be moved to be the last-of-4, and changed to:

                          • To use any other literal character in a character class, just use it directly, i.e., no “escaping” needed

                          Maybe it works well as a 2 column 4 row table, headers:

                          • Character
                          • To use it literally in a character class

                          With those headers, the “cell contents” for column 2 could be appropriately shortened to remove redundant verbiage.

                          1 Reply Last reply Reply Quote 1
                          • guy038G
                            guy038
                            last edited by

                            Hi, @peterjones,

                            BTW, Peter, do you intend to include, in some way, the end part of this post, regarding the Free-space mode, which is in the Notes section ?

                            https://community.notepad-plus-plus.org/post/81368


                            Also, did you correctly receive, by e-mail, my attached text file, regarding the TextFX features ?

                            Please, I do not want to stress you, unnecessarily ! Just go at your own pace !

                            Best Regards

                            guy038

                            Alan KilbornA 1 Reply Last reply Reply Quote 1
                            • Alan KilbornA
                              Alan Kilborn @guy038
                              last edited by

                              @guy038 said in Using sets to find A-Za-z plus the # and - chars ..?:

                              do you intend to include, in some way, the end part of this post, regarding the Free-space mode

                              He already did, see HERE.

                              Andrew McPA 1 Reply Last reply Reply Quote 1
                              • Andrew McPA
                                Andrew McP @Alan Kilborn
                                last edited by

                                @Alan-Kilborn I really admire you guys for figuring out Regular Expressions; I bet you never get lost in real life when you can keep track of the patterns/positions so well, aka good spatial awareness :)

                                Oh and I like the trick of having - as last character before ]

                                Alan KilbornA 1 Reply Last reply Reply Quote 1
                                • Alan KilbornA
                                  Alan Kilborn @Andrew McP
                                  last edited by Alan Kilborn

                                  @Andrew-McP said in Using sets to find A-Za-z plus the # and - chars ..?:

                                  I really admire you guys for figuring out Regular Expressions

                                  So if someone says they have “figured out regular expressions”, I pity them. Because it just means they are ripe for an upcoming whipping when a regex misunderstanding of theirs really embarrasses them. :-)

                                  It pays to always be humble when discussing regular expressions with others. :-)

                                  I bet you never get lost

                                  GPS!

                                  I like the trick of having - as last character before ]

                                  Not so much a trick, as a logical place to put it when you realize that anywhere except the first or last position it must form some sort of “range”.

                                  Andrew McPA 1 Reply Last reply Reply Quote 1
                                  • Andrew McPA
                                    Andrew McP @Alan Kilborn
                                    last edited by

                                    @Alan-Kilborn hahahah yes no way would I bet my house on any regular expression I recommend covering all, no matter how perverse, eventualities…

                                    1 Reply Last reply Reply Quote 1
                                    • guy038G
                                      guy038
                                      last edited by

                                      Hello, @peterjones,

                                      In my previous post, I forgot to mention the ^ character, which has a special meaning within a Character class !

                                      So, here is an updated version of my previous post :

                                      If we consider the following CHARACTER CLASS structure :
                                      
                                      [.......]
                                      123456789
                                      
                                      The POSSIBLE location(s), in order to find the LITERAL character below, are :
                                      
                                      LITERAL Character [    :     POSSIBLE at any position, BETWEEN 2 to 8 
                                                                   POSSIBLE at any position, BETWEEN 2 to 8, if PRECEDED with an ANTI-SLASH character
                                      							 
                                      LITERAL Character ]    :     POSSIBLE at position 2 ONLY
                                                                   POSSIBLE at any position, BETWEEN 2 to 8, if PRECEDED with an ANTI-SLASH character
                                      							 
                                      LITERAL Character -    :     POSSIBLE at position 2
                                                                   POSSIBLE at position 8
                                                                   POSSIBLE at any position, BETWEEN 2 to 8, if PRECEDED with an ANTI-SLASH character
                                      
                                      LITERAL character ^    :     POSSIBLE at any position, BETWEEN 3 and 8
                                                                   POSSIBLE at any position, BETWEEN 2 to 8, if PRECEDED with an ANTI-SLASH character
                                      
                                      							 
                                      LITERAL Character \    :     POSSIBLE at any position, BETWEEN 2 to 8, if PRECEDED with an ANTI-SLASH character
                                      

                                      And I suppose that @alan-kilborn could add :

                                      To use a “literal ^” in a character class: Use it directly like any other character, e.g. [ab^c], but right after the opening [ of the class notation ; “escaping” is not necessary (but is permissible), e.g. [ab\^c]

                                      Best Regards,

                                      guy038

                                      1 Reply Last reply Reply Quote 2
                                      • First post
                                        Last post
                                      The Community of users of the Notepad++ text editor.
                                      Powered by NodeBB | Contributors