Community
    • Login

    REGEX - Select everything before a particular word included the line with Word ?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    32 Posts 10 Posters 74.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Scott SumnerS
      Scott Sumner
      last edited by

      @guy038 , I trust you will not be satisfied until you dig deeper to characterize more fully this contrary \K behavior you have discovered.

      I think you might find that having the “Wrap around” option enabled is key to the Replace button doing what it does here.

      1 Reply Last reply Reply Quote 0
      • Neculai I. FantanaruN
        Neculai I. Fantanaru
        last edited by

        hi, guy038. Works just fine !!

        thank you very much !

        1 Reply Last reply Reply Quote 0
        • Ashton WattsA
          Ashton Watts
          last edited by

          Hi @guy038 ,

          I’m hoping you can help as the bits above were really helpful but I still have a bit to do.

          I want to delete everything between two points in 47000 html files.

          I can insert the points using a simple find an replace so i would be left with;

          Want to keep
          Want to keep
          Want to keep
          START-DELETING
          delete
          delete
          delete
          delete
          delete
          STOP-DELETING
          Want to keep
          Want to keep
          Want to keep
          Want to keep
          Want to keep

          Hoping you have the answer.

          regards,

          1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by guy038

            Hello, ashton-watts, and All,

            I suppose that all your .html files are in a specific directory. So :

            • First, I strongly advice you to backup the directory containing all your .html files !

            • Start Notepad++

            • Now, open the Replace in Files dialog ( Ctrl + Shift + F )

            • Type, in the Find what: zone, the regex (?s-i).*\R\KSTART-DELETING.*STOP-DELETING\R

            • Leave the Replace with: zone EMPTY

            • Insert *.html in the Filters: zone

            • Fill the Directory : zone with the absolute path of your specific folder

            • Finally, click on the Replace in Files button

            • Click on the Yes button, to confirm replacement

            Et voilà :-))

            So from the initial text, below :

            Want to keep
            Want to keep
            Want to keep
            START-DELETING
            delete
            delete
            delete
            delete
            delete
            STOP-DELETING
            Want to keep
            Want to keep
            Want to keep
            Want to keep
            Want to keep
            

            you’ll get :

            Want to keep
            Want to keep
            Want to keep
            Want to keep
            Want to keep
            Want to keep
            Want to keep
            Want to keep
            

            Important :

            It could be useless to insert marks, in order to determine the starting and ending boundary of the range of lines to be deleted. Two possibilities :

            • The boundaries are easy to isolate, among text around and are unique. In that case, it could replace the generic START-DELETING and STOP-DELETING lines

            • The boundaries may be literally different but follow a same template. In that case, they can be found with a regex, which would be mixed with my regex above !

            So, if it’s not confidential information and if you don’t mind, give us an example of the START-DELETING and STOP-DELETING lines of your .html files ! You could also join one of your files, or part of it, as an attached file, with your mail at my e-mail address :

            Thanks, for this additional information !

            See you later,

            Best Regards

            guy038

            Ashton WattsA blackburn1489B 2 Replies Last reply Reply Quote 0
            • Ashton WattsA
              Ashton Watts @guy038
              last edited by

              @guy038 Hi goy038,

              You are a legend. the regex search string above worked perfectly. I had already inserted the start and stop points so it wasn’t an issue.

              Thanks very much for your help.

              regards,

              1 Reply Last reply Reply Quote 0
              • blackburn1489B
                blackburn1489 @guy038
                last edited by blackburn1489

                @guy038 Hello! Can u help me, please?

                I need to get WORD between another word and part of the WORD word
                example

                title = WORD_name

                After I get WORD, I need to find all WORD in the document

                and rename them in WORD_lttz
                //
                After that I need to repeat all operations. but with another WORD1, WORD2, WORD3 and so on
                that placed between “title =” and “_name”

                title = WORD1_name

                find them in entire document and rename them in WORD1_lttz , WORD2_lttz , WORD3_lttz and so on

                1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by guy038

                  Hello, @blackburn1489, and All,

                  I took some time to figure out what you exactly wanted to do and I hope that my solution will be close enough to what you need !

                  OK, let’s suppose that we start with the sample text below :

                  title = ABC_name
                  title = DEF_name
                  title = YZ_name
                  title = GHI_name
                  title = JKL_name
                  title = MNO_name
                  title = YZ_name
                  title = ABC_name
                  title = MNO_name
                  title = MNO_name
                  title = PQR_name
                  title = MNO_name
                  title = STU_name
                  title = VWX_name
                  title = ABC_name
                  title = YZ_name
                  title = GHI_name
                  

                  Note that it contains 3 lines with the string ABC, 2 lines with the string GHI, 4 lines with the string MNO and 3 lines with the string YZ !

                  Now, let’s imagine that you would change each string ABC, DEF… into new strings, according to the table below :

                  ABC    ->    ABC111
                  DEF    ->    DEF-22222
                  GHI    ->    GHI_GHI
                  JKL    ->    J
                  MNO    ->    mno
                  PQR    ->    000PQR
                  STU    ->    Test
                  VWX    ->    99
                  YZ     ->    Y-Z
                  

                  Then, using the following regex S/R :

                  SEARCH (?-i)title\x20=\x20(?:(ABC)|(DEF)|(GHI)|(JKL)|(MNO)|(PQR)|(STU)|(VWX)|(YZ))(?=_name)

                  REPLACE title = (?1\1111)(?2\2-22222)(?3\3_\3)(?4J)(?5\L\5)(?{6}000\6)(?7Test)(?{8}99)(?9Y-Z)_lttz

                  would, simultaneously, change any occurrence of these 9 strings, into the new ones, defined in the table above ;-))

                  So, after clicking on the Replace All button, you would get, at once, the following text :

                  title = ABC111_lttz_name
                  title = DEF-22222_lttz_name
                  title = Y-Z_lttz_name
                  title = GHI_GHI_lttz_name
                  title = J_lttz_name
                  title = mno_lttz_name
                  title = Y-Z_lttz_name
                  title = ABC111_lttz_name
                  title = mno_lttz_name
                  title = mno_lttz_name
                  title = 000PQR_lttz_name
                  title = mno_lttz_name
                  title = Test_lttz_name
                  title = 99_lttz_name
                  title = ABC111_lttz_name
                  title = Y-Z_lttz_name
                  title = GHI_GHI_lttz_name
                  

                  Et voilà !

                  Notes :

                  • Regarding the search regex :

                    • First, the (?-i) syntax forces the search to be processed, in a sensitive way ( NON-insensitive )

                    • Now, the part title\x20=\x20 tries to match the string title =, with a space character, before and after the equal sign

                    • Then, the (?: syntax starts a non-capturing group

                    • The part (ABC)|(DEF)|(GHI)|(JKL)|(MNO)|(PQR)|(STU)|(VWX)|(YZ) are, simply, 9 alternatives, corresponding to our 9 strings to be changed. Thus, each of them, between parentheses, is stored as group 1, 2, 3…

                    • The final part )(?=_name) corresponds to the closing parenthesis of the non-capturing group, followed with a look-ahead structure or condition ( Is there the string _name afterABC, DEF… ? ) which must be true for an overall match

                  • Regarding the replacement regex :

                    • First, it rewrites the string title = , followed with a space character

                    • Then any (?#....) syntax, where # represents a digit, is a conditional replacement and all the regex after the #, till the closing parenthesis, is evaluated, if the matched string is stored in group #

                    • Note that the 9 conditional replacement structures (?1\1111)(?2\2-22222)(?3\3_\3)(?4J)(?5\L\5)(?{6}000\6)(?7Test)(?{8}99)(?9Y-Z) could be placed in any order

                    • In some of them, we rewrite the searched string, stored in group # , due to the \# escape sequence

                    • In the conditional replacement (?5\L\5) we, simply, rewrite the upper-case string MNO, in lower-case, because of the \L replacement escape sequence

                    • Be aware, too, that concerning the groups 6 and 8, their conditional replacements are build with the alternate form (?{#}....). Indeed, we must distinguish between the group number # and the digits, which follows it !. If the braces would have been absent, the regex engine would think that groups 6000 and 899 were concerned :-((

                    • And finally, of course, it rewrites, in all cases, your ending part, the string _lttz !

                  Best Regards,

                  guy038

                  1 Reply Last reply Reply Quote 0
                  • And BojaA
                    And Boja
                    last edited by

                    Hi,

                    I have some E-mails

                    100km@laufwunder.at
                    100km@tus-ahrweiler.de
                    100kmbelves@free.fr
                    12ahewitt@royalschoolcavan.ie
                    12lfuller@royalschoolcavan.ie
                    12oakinlabi@royalschoolcavan.ie
                    12vkells@royalschoolcavan.ie
                    13@123.com
                    13362880852@zj165.com
                    1573364@mail.ru
                    1matoo@zoznam.sk
                    2008.lizhigang@163.com

                    I Want to delete all words till the @ sorry for my english i have 1 milion e-mails so i want to remove all words till the domain start example:

                    i want to split them into this:

                    @laufwunder.at
                    @tus-ahrweiler.de
                    @free.fr
                    @royalschoolcavan.ie
                    @royalschoolcavan.ie
                    @royalschoolcavan.ie
                    @royalschoolcavan.ie
                    @123.com
                    @zj165.com
                    @mail.ru
                    @zoznam.sk
                    @163.com

                    Hope someone understand me what i am trying to say :S

                    Claudia FrankC 1 Reply Last reply Reply Quote 0
                    • Claudia FrankC
                      Claudia Frank @And Boja
                      last edited by Claudia Frank

                      @And-Boja

                      if your real data looks like your posted data then something like

                      find what:^.*?(?=@)

                      will do the job. Replace with is empty.
                      See FAQ for more info on regex.

                      Cheers
                      Claudia

                      1 Reply Last reply Reply Quote 0
                      • Md Abdullah Al NomanM
                        Md Abdullah Al Noman
                        last edited by

                        I want to delete everything between two points with 36000 line xml files.
                        which portion is repeated in files.
                        I can insert the points using a simple find an replace so i would be left with…
                        <Middle></Middle>
                        <WebsiteList></WebsiteList>
                        <EventList></EventList>
                        <Note></Note>
                        <LastName></LastName>
                        START-DELETING… <Photo>
                        nO3df2vjyB3H8akt18KyV8LCVpA5p/USwy3koIX77+7plT6XPpiWDeyCjw2Nl5izg4K1toJ8kUL/
                        UEjS/HAcr0bzne98Xv8cy4E9Jm+P9XP0p3/8818CgIya6gEA/B8UCbSgSKAFRQItKBJoQZFAC4oE
                        WlAk0IIigRYUCbSgSKAFRQItKBJoQZFAC4oEWlAk0IIigRYUCbSgSKAFRQItKBJoQZFAC4oEWlAk
                        0IIigRYUCbSgSKAFRQItKBJoQZFAC4oEWlAk0IIigRYUCbSgSKAFRQItKBJosVQPQEuu2xJCuK5T
                        /DOOEyFEsk6z/EblsFhAkbtyHDvou67rtNv2k//ZK/6TZfk6SeP4Ko6TOL6qeIQ8oMhXNJsN3+8E
                        fe+5EB+zrLrnOp7rCNHLsjyKVlG0iuMEc+fuUOStoO/1A7ft2JZVL6Y6IUTxz/1e0LLqQeAFgSeE
                        KNKMom9I81UoUgghxkdhkU6hmOpKfH3f7/h+R4gwilbLOImi1WZzXeLrc4IiRdD3HuYoVZHm+9HB
                        ep3OF0uk+RSKFMNhr/o3bbftdvsAaT5lepF+t2PbDYUDeJjmbHaJbU3TiyyOLFLQbtvjcZhlwXy+
                        nE4vjO3S6HM2rtuqbAtyR5ZVHwz8n38+CsOu6rGoYdwc2Ww2Doc93+/sfVinApZVfz86aDv25LeZ
                        6rFUzawig743GgWUW3yomL9Ni9KgX+1ms6FRjoUg8PxuR/UoKmVQkeNxqFeOhfE4tOoG/ZlM+ah+
                        t1PuaZjKWFZ9PB6oHkV1TClyNDpQPYT9+X6HzlEq2YwoMuh7ag+Df7/xkSnTpBFFKjlPWC7bbjD4
                        FLvgXySDCbIwCLsm7OLw/4RsphbLqocDX/UopGNepPILKcplwjTJ/OP5PqvDyyZMkyhSM0Gf1qUh
                        peNcpOu2dDxJs51tNxzn9XvQ9MW7SC1P0rwq6LuqhyAR7yJ5nufg+k0rcC7Sbv5Z9RCk2OXOcX2x
                        LpLRcZ9HGG9Kci6SMcti+4dj+8FAUygSaEGRQAuKBFo4F7lep6qHAG/Gucgsz1UPAd6Mc5GMpSnb
                        ZatQpJYYL6SGIoEWFAm0oEigBUUCLShSP8s4UT0EiTgXuWF6iCTPOC+/y7nIlOkhkuJZO1xxLpLr
                        HBnjV1tTXOeShPX5es5FJkmaZdxObafpNe/HOHAuUgjB7wGuCdOJ/w7zIqNopXoIJeO6KXKHeZH8
                        Dt3x3q0R7IvcbK6ZXbfLe7dGsC9SCDGbXaoeQpl479YI9kW6bovT7jaz+f5ZnJ/5NfprMOC12mK7
                        bf/6ywchxGQymy+WqocjBec5klmOD7FZyvopzkUyxnhJIxSpJU4bx4+gSC3xOxd1h3ORjCcSfkf+
                        73AukvEJN8yRWuL6Z8uynPH1FpyL5HeZRYHr5ypwLpLl9ZGC79xf4FykYDqdRNE31UOQCEVqZhkn
                        vC+2YF4kv91tft+xR5gXudlcM1vYjvdGpGBfpOB1Y0qaXnP6OM/iXySn0xvsb2kQJhTJaVJhvxEp
                        TCgyjq94HJXMsjy6RJEs8JhaeHyKVxlRJI8bAM6mF6qHUAUjiozjK033b+4OXZ2fR4xXw3+I851f
                        D33+9NVp21l28/e/jVSP5Q2m04t1klpWjf1hyDumFJnlN8Ufdb1ONXpiehR9433O8CkjfrUf0uiQ
                        3nqdmpajMLJIbX7+NPrylMjAIrX5M2v05SmRcUVm+Y0uB/Y0+vKUyLgihSZnuqNoZeBGpDCzSC3m
                        SC0GKYOJRW4218T/3lmW8751YQsTixTkZyBjf7KFsUXOF0vKFwTxOBG/H0OLFEKcU117dxknZh73
                        KZhb5Ow8ojlNLuax6iGoZG6RWX5DcJpM02uTf7KFyUUKktPk5Ldz1UNQzOgis/yG1GWwUbQyeQuy
                        YHSRQojZ7JLIKZwsyycT0ydIgSKFEJ8/faXw2z2ZzIw9BvkQihRZfvPx5ExtlJPJzIT7DHeBIoUQ
                        IklShVEyfjjNHlDkrSRJT0/n1b/vfL5Ejg+hyHvp5g8Vb2rEHYa7Q5FAC4pUrO1oc2NkNVDkPafy
                        OLIsJ3WIngIUea/i6SrL8o8nZ5yWbisFirxX5RyJHF+CIm9Z9Vpla10gxy1Q5C3Xdap5I+S4HYq8
                        5bqtCt4FOb4KRd7y/Xey3wI57gJFCiGE3+3YdkPqWyDHHaFIIYQIB12pr48cd4ciheu2PJm7Ncjx
                        TVCkGA578l4cOb6V6UWGYVfeBIkc92B0kY5jvx8dSHpx5Lgfc4tsNhs/HR9KenHkuDdTVsZ/xKrX
                        Pvz4g2XVZbz4fjlaVn18FEoaUumWcbKYL2VcbmxikY5j/3R8SC3H4+NDja6VdN3W4bB3Nr2Yln01
                        nXFFBn1vNAqQYykOh708y8tdrMagIpvNxvvRge93JL2+aTkWhsP+fBGXeBunEUVa9Vo48AdhV95W
                        mpk5CiEsq+Y4domL+DMv0nVbQd/z/Y7UPQZjcyy4bgtFbmPVa77/zvc7rtuqYNfV8BxLx6fIZrPh
                        +52g71X52EPkWDrti1QSYgE5yqBxkcUGorx95+2QoyT6Fek49iDsyt5Z2Y5sjlmWn88uH+1ntB17
                        JO30fel0KjLoe2HYVf50bLI5RtHqy+nvG80XEtKgyAqOJu6OZo5Jkn45/Z3HitGki2w2G4fDXhB4
                        qgdyj1qOWZZPpxcEHzqxN6JFWvXaaHRAqsUCqRzPZ5fT6SLLWK0VTa5IUr/R30lejnGcfDmds7z+
                        klaRfrczHmtzjeB2knJM0+vp9ILxsrxUirTqtfF4oOrgYunkzY7//s8XZj/Tj5C4q6HZbBwf/wU5
                        7sLvvrj2huu2NDru+BL1c6TUK7qrJ/tAz3DYe/qTfXfpZ4nX4KiiuEip97soMT4KpR4Gt+1G0Pfu
                        orSsWhj6hzJvOa+Y4iJ//PCD7AV3KlbBt+tumhyE3eGwx+n7LNQWGfQ9qcubcGXbjdHooILVs5RQ
                        WaTU5U14G4Ryl85SSNm+NtevOHwndUVyOdYD5VJTpFWvoUh4lpoiff8dsz1EKIuqIjFBwvNQJNCi
                        oEi/ixzhRSqKxAQJL0ORQEvVRTqOjb1s2KLqIjFBwnZVF1nN4wdBX1UXiQVGYLtKi8RGJLyq0iLt
                        Ji72gVdUO0eqXrIH6Ku0SGxEwqsqLbJukbgZFyir9K6Gk5OzKt8OdIRJC2hRv4IAlMh1nV9/+aB6
                        FN8FcyTQgiKBFhQJtKBIoAVFlqzEp6jqotylflFkyaJopXoIVSv3GREosmTzxZLHUzx2dDa9KPdn
                        AUWW79Pnr4ZEeT67nE4vyn1NHCEvX5blH0/+G/S9IPAcx7bYnc2P46s0/WO+iGUs6YsiZZkvloyf
                        qCAPt68v6A5FAi0oEmhBkUALigRaUCTQgiKBFhQJtKBIoAVFAi0oEmhBkUALigRaUCTQ8j+9xvaf
                        +IjmkgAAAABJRU5ErkJggg==
                        </Photo>
                        END-DELETING
                        <GroupList></GroupList>
                        <Job></Job>

                        Hoping you have the answer.

                        1 Reply Last reply Reply Quote 0
                        • Terry RT
                          Terry R
                          last edited by

                          Given the example you provided the following would remove all text between and including the START and END-DELETING lines.
                          Find: (START-DELETING.+\R)(.+\R)+(END-DELETING\R)
                          Replace: empty string here

                          So the assumption is that there must be at least 1 line between the 2 identifying lines (START and END), that’s the (.+\R)+ portion of the regex. Also note that the first group (START-DELETING.+\R) includes the .+ as your example also has 3 period characters after it. I’ve included brackets around each sub-portion just so as it makes it a bit easier to segment out and identify what each group is doing. Only the middle group brackets are absolutely necessary, i.e.(.+\R)+.

                          You say you can/have replaced using a simple find and replace to get the START and END lines in there. With my regex you could replace those portions with the original string you used to find. That would save you 1 or 2 additional steps.

                          Hope this helps.

                          Terry

                          1 Reply Last reply Reply Quote 0
                          • guy038G
                            guy038
                            last edited by

                            Hello, @md-abdullah-al-noman, @terry-r and All,

                            I think, that the following regex S/R, could be used, too :

                            SEARCH (?s-i)^\h*START-DELETING.+?END-DELETING\R

                            REPLACE Leave EMPTY

                            Notes :

                            • First, the (?s-i) modifiers, means that, from now on :

                              • Any regex dot symbol ( .) will match, absolutely, any single character ( standard ones and EOL ones )

                              • The search will be processed in a sensitive way ( Non-insensitive ! )

                            • Then, the part ^\h*START-DELETING looks, from beginning of line ( ^ ), for the upper-case string START-DELETING, possibly preceded with some horizontal space characters ( Usual space or tabulation )

                            • At end, the part END-DELETING\R searches for the upper-case string END-DELETING, followed with its line-break character(s)

                            • And the middle part .+? represents the shortest range, of any character, between the two strings START-DELETING and END-DELETING

                            • Finally, as the replacement regex is empty, all the overall match is, simply, deleted

                            Best regards,

                            guy038

                            1 Reply Last reply Reply Quote 0
                            • David BennettD
                              David Bennett
                              last edited by

                              @guy038 you truly are a legend, I agree with the other poster. You are so deep into notepad++ regex, impressive!
                              I believe you may also know this - IMHO quite common - case, although I can’t find it described anywhere:

                              Suppose you have just one large file (wordpress sql database in fact, opened in my favorite editor notepad++) and STRING A and STRING B should always belong together:
                              FIND ALL INSTANCES OF ANY TEXT across lines
                              WHERE STRING A sometime later
                              IS FOLLOWED BY ANOTHER STRING A
                              INSTEAD OF THE “CLOSING” STRING B

                              Example: Find all instances where, across lines, there’s the literal string [/social]
                              and after any kind and number of characters there’s another literal string [/social]
                              BUT in between the two is nowhere a literal string [social] although it should be because [social] and [/social] belong together.

                              So basically in the example case, string A and string B always belong together, there must never follow two A’s or two B’s. Always the A string, then the B string. Then again the A string, then the B string. Etc. And so you need to find any “fault”: where A is followed sometime later by another A, instead of first a B string.

                              Did I explain this well enough?

                              I am sure none of the above, nor anything else I have found, works because I’ve tried them all. Would you have an idea how to go about this?

                              1 Reply Last reply Reply Quote 1
                              • guy038G
                                guy038
                                last edited by guy038

                                Hello @david-bennett, and All,

                                Thanks, David You explained your problem very well. So you’re looking for ranges [social].......[/social], where, unfortunately, one boundary;, either [social] or [/social] is missing, aren’t you ?

                                As a sample, in the text, below, I indicated where the boundary is missing :

                                ...[social].......[/social]..............[/social]........[social]............[social]..........[/social]...
                                                                 ^                                     ^
                                                          [social] missing			            [/social] missing
                                

                                BTW, I, also, assume that your database does NOT contain nested blocks [social].......[/social] as, for instance :

                                [social]....[social].....[social].....[social].....[/social]....[/social].....[/social]......[social].....[/social]....[/social]...
                                

                                In that case, a possible regex could be :

                                SEARCH (?-s)(?<=\[/social\])((?!\[social\]).)+?(?=\[/social\])|(?<=\[social\])((?!\[/social\]).)+?(?=\[social\])

                                If you apply this regex against the text, below, it select all the zones where a boundary is missing !

                                ...[social].......[/social]..............[/social]........[social]............[social]..........[/social]...
                                                          >              <                       >            <
                                

                                Note that :

                                • If the selection is surrounded with two boundaries [social], then, this selection should contain a [/social] ending boundary

                                • If the selection is surrounded with two boundaries [/social], then, this selection should contain a [social] starting boundary


                                If your text may be split on several lines, use, preferably, this regex, almost identical, which is, also, correct for ONE-line blocks [social].......[/social] !

                                SEARCH (?s)(?<=\[/social\])((?!\[social\]).)+?(?=\[/social\])|(?<=\[social\])((?!\[/social\]).)+?(?=\[social\])

                                ...[social].......[/social]....
                                                          >
                                .....
                                .....[/social].....
                                     <
                                .......
                                ...[social].......[/social]..............[/social]........[social]............[social]..........[/social]...
                                                          >              <                       >            <
                                ......
                                ...[social]...
                                          >
                                ....
                                .....[social]...
                                     <
                                .......[/social]...
                                

                                Notes :

                                • The square brackets need to be escaped with the \ character, as they have a special meaning, in regular expressions

                                • At the beginning, the (?-s) or (?s) modifier determines if the dot meta-character ( . ) represents a single standard character only, or any character

                                • Then the regex engine tries to match one of the two alternatives :

                                  • (?<=\[/social\])((?!\[social\]).)+?(?=\[/social\])

                                  • (?<=\[social\])((?!\[/social\]).)+?(?=\[social\])

                                • The first alternative matches the smallest range of characters ( (....)+? ), surrounded by two strings [/social], due to the look-behind (?<=\[/social\]) and the look-ahead (?=\[/social\])

                                • The second alternative matches the smallest range of characters ( (....)+? ), surrounded by two strings [social], due to the look-behind (?<=\[social\]) and the look-ahead (?=\[social\])

                                • In the first alternative, this range must not contain, at any position, the string [social], due to the negative look-ahead, in the construction (?!\[social\]).

                                • In the second alternative, this range must not contain, at any position, the string [/social], due to the negative look-ahead, in the construction (?!\[/social\]).

                                Best Regards,

                                guy038

                                1 Reply Last reply Reply Quote 1
                                • David BennettD
                                  David Bennett
                                  last edited by

                                  Hey @guy038, thanks for replying! And hello to every notepad++ user.

                                  So you are suggesting, in words,

                                  • match a prefix, here [/social], but exclude it from the capture
                                  • capture a group:
                                    — if suffix is absent, here [social]
                                    — and any character, one or more times, but as few as possible
                                  • match a suffix, here [/social], but exclude it from the capture

                                  Is that worded right?

                                  Earlier I had tried many variations with the look-behind and look-ahead as well, because this simple construct makes so much sense. And then in between, to exclude captures where [social] appears, like it normally should.
                                  Your group capture notation ((?![social]).)+? however I hadn’t tried, thanks for this new variation in my sortiment, I always used exclusion notations like .?(?![social]) and even tried .?[^([social])] which I think is wrong Regex syntax in Notepad++ too.

                                  Either way, unfortunately your regex too does not find the instance where [/social] follows an [/social] without the corresponding [social] in between.
                                  (“Corresponding” because [social]…[/social] here is a “shortcode” in wordpress, but could be anything in other situations of text processing needs.)

                                  Using your regex in all variations you gave in my notepad++ 7.5.8 highlights the entire text (here a database), ie it has 0 hits.

                                  So I was wondering, could there be, logically, any kind of situation where

                                  • notepad++ COUNT [social] has 117 hits

                                  • and notepad++ COUNT [/social] has 118 hits
                                    (as it does)

                                  • and yet, this would NOT be due to the presumed occurrence of one end-marker, here [/social], missing its corresponding start-marker, here [social]?

                                  Because if, logically, such situation is possible (despite that I myself can’t think of one), then your regex may be working despite that in my particular case it cannot find anything.

                                  Did I explain this puzzle well enough?

                                  Just to clarify, this is not “a notepad++ oddity”, my Expresso Regex sw has 0 hits too with your proposed regex, and with all notations that I had tried earlier.
                                  If any oddity then the oddity must be right within the sql database. But I can’t think of one. Can you or anyone else maybe?

                                  Scott SumnerS 1 Reply Last reply Reply Quote 0
                                  • David BennettD
                                    David Bennett
                                    last edited by

                                    Hey @guy038, thanks for replying! And hello to every notepad++ user.

                                    So you are suggesting, in words,

                                    • match a prefix, here [/social], but exclude it from the capture
                                    • capture a group:
                                      — if suffix is absent, here [social]
                                      — and any character, one or more times, but as few as possible
                                    • match a suffix, here [/social], but exclude it from the capture

                                    Is that worded right?

                                    Earlier I had tried many variations with the look-behind and look-ahead as well, because this simple construct makes so much sense. And then in between, to exclude captures where [social] appears, like it normally should.
                                    Your group capture notation ((?!\[social\]).)+? however I hadn’t tried, thanks for this new variation in my sortiment, I always used exclusion notations like .*?(?!\[social\]) and even tried .*?[^(\[social\])] which I think is wrong Regex syntax in Notepad++ too.

                                    Either way, unfortunately your regex too does not find the instance where [/social] follows an [/social] without the corresponding [social] in between.
                                    (“Corresponding” because [social]…[/social] here is a “shortcode” in wordpress, but could be anything in other situations of text processing needs.)

                                    Using your regex in all variations you gave in my notepad++ 7.5.8 highlights the entire text (here a database), ie it has 0 hits.

                                    So I was wondering, could there be, logically, any kind of situation where

                                    • notepad++ COUNT [social] has 117 hits

                                    • and notepad++ COUNT [/social] has 118 hits
                                      (as it does)

                                    • and yet, this would NOT be due to the presumed occurrence of one end-marker, here [/social], missing its corresponding start-marker, here [social]?

                                    Because if, logically, such situation is possible (despite that I myself can’t think of one), then your regex may be working despite that in my particular case it cannot find anything.

                                    Did I explain this puzzle well enough?

                                    Just to clarify, this is not “a notepad++ oddity”, my Expresso Regex sw has 0 hits too with your proposed regex, and with all notations that I had tried earlier.
                                    If any oddity then the oddity must be right within the sql database. But I can’t think of one. Can you or anyone else maybe?


                                    Edit: I went back up to add some extra characters as this comment software here seems to require ESCAPING (like Regex does), otherwise it shows a DIFFERENT text-to-be-posted (even on the right side WHILE you are writing) than which you input (even WHILE you input it on the left side).
                                    Hopefully now the OUTPUT text matches my INPUT text…
                                    How do YOU more easily get your Regex notations to show up LIKE YOU ENTER THEM? Yours come up in red on grey background?

                                    I am posting this again, because “You are only allowed to edit posts for 180 second(s) after posting”… and then even “As new user you can only post once every 1200 seconds” - lol, such bureaucracy makes even genuine comments like mine needlessly difficult…

                                    1 Reply Last reply Reply Quote 1
                                    • guy038G
                                      guy038
                                      last edited by guy038

                                      Hi, @david-bennett, and All,

                                      OK, my regex does not match something. Quite disappointing :-(( But I don’t give up !

                                      If you get no result, this means that, either :

                                      • My regular expression is not well constructed or my concept, to solve your problem, is erroneous

                                      • Some characters in your text, or its general layout, prevents us from obtaining positive results

                                      • May be, the two above steps arise together :-((

                                      So, if you don’t mind, and if your data is, both, not confidential nor personal, you could send it ( or part of it ) to me. Here is, below, my e-mail address :

                                      Working with real data is always better and, anyway, Notepad++ is really a Swiss knife ! Thus, no doubt about it ! We will, finally, find an acceptable solution ;-))


                                      Regarding the red on gray background, you can obtain it by wrapping your text between two grave accents ( ` )

                                      For instance

                                      • If you write `text in red on gray`                =>    normal text :          text in red on gray

                                      • If you write *`text in red on gray`*            =>    text in italic :         text in red on gray

                                      • If you write **`text in red on gray`**        =>    text in Bold :          text in red on gray

                                      • If you write ***`text in red on gray`***    =>    text in Bold-Italic :          text in red on gray


                                      Refer, also, to the excellent summary of the Markdown syntax, on our forum, below, by @scott-sumner !

                                      https://notepad-plus-plus.org/community/topic/14262/how-to-markdown-code-on-this-forum/4

                                      And this FAQ Desk: post, from @peterjones, will give some additional information :

                                      https://notepad-plus-plus.org/community/topic/15739/faq-desk-request-for-help-without-sufficient-information-to-help-you/1

                                      Best Regards,

                                      guy038

                                      P.S. :

                                      David, it would be particularly interesting if you could send me the part where you got 117 hits for [social] and 118 hits for [/social] !

                                      1 Reply Last reply Reply Quote 1
                                      • David BennettD
                                        David Bennett
                                        last edited by

                                        First, thanks so much for replying! @guy038
                                        Maybe you are an official here (hence why you know so much), either way, much appreciated taking the time!

                                        Second,

                                        “If you get no result, this means that, either :…”

                                        Well no, like I hinted, the reason likely is at my end, lol:

                                        “So I was wondering, could there be, logically, any kind of situation where notepad++ COUNT [social] has 117 hits, and notepad++ COUNT [/social] has 118 hits (as it does), and yet, this would NOT be due to the presumed occurrence of one end-marker, here [/social], missing its corresponding start-marker, here [social]? - Because if, logically, such situation is possible (despite that I myself can’t think of one), then your regex may be working despite that in my particular case it cannot find anything.”

                                        Again, your regex may be working well in other raw texts :-)
                                        So were you, or anyone else, able to think of a “logical” possibility/explanation of the above COUNTS?

                                        Also thanks a lot for your “markdown” explanation and for Scott’s helpful link, I multi-clipboarded it, just have to remember that, the “quote” tip I used above already, you noticed.

                                        Well, posting the raw db publicly certainly is not wise but sending it to you would be no problem I think. I assume you would find, your regex works in general, and maybe even find out, why it doesn’t work here. So I think, for both, it would be good to know, yes. :-)

                                        Presumably, had you found a flaw in my initial assumption (above) you would have raised it, right @guy038 ?

                                        If anyone else finds a flaw in it, shout it out loud, will ya?

                                        1 Reply Last reply Reply Quote 0
                                        • guy038G
                                          guy038
                                          last edited by

                                          Hi, @david-bennett, and All,

                                          Hum…, not totally sure if I got your message properly, but the N++ COUNT feature always scan the entire current file, even if one or several selections are present and whatever the Wrap around option is set or unset !

                                          Of course, in Normal mode, for instance, the count result may be different if you tick/untick the Match case and/or the Match whole word only options

                                          So, to my mind, if we assume that :

                                          • The Match case AND the Match whole word only options are not ticked :

                                          • No parameter has been changed, in the Find dialog, between the two COUNT operations ( except for the Find what: zone, of course ! )

                                          => The count of the presumably well-balanced strings [social] and [/social] should always return identical numbers. If NOT, it, necessarily means that there are, indeed, one or several additional occurrences of one of the boundaries :-((

                                          Cheers,

                                          guy038

                                          1 Reply Last reply Reply Quote 2
                                          • David BennettD
                                            David Bennett
                                            last edited by

                                            the N++ COUNT feature always scan the entire current file, even if one or several selections are present and whatever the Wrap around option is set or unset !

                                            Agreed. No doubt about that. And regardless of options set, both normal and regex search always finds one more [/word] than [word]. I verified it for each potential oddity. ;-)
                                            And yet, your earlier proposed regexes (is that a word?) give 0 hits. In regex search mode, lol.

                                            Did you get the sql raw text I sent to the email you … made public? Did you find out if your suggested regex is working some other way in it?
                                            I know that normally notepad++ regex works, in this and any sql db, because I am using it all the time, successfully. So now you got me curious what your explanation is why your regexes fail in this sql db?

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors