Community
    • Login

    Interrupted list

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    13 Posts 4 Posters 498 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Dumitru S.D
      Dumitru S.
      last edited by

      Hi to all!
      I have a problem with matching regex strings. So, I tried but I still need help.

      I summarized my situation in four points below:

      1. a string like this to be matched:
        Abc 2:5a; 24:51d, 53; 1:9b, 22-23c; 1:22-23, 9; 1:22-23, 24-25;
      2. this string should not be matched if it is found at the start of a line, that is, a new row; only in the middle o at the end of a row.
      3. if in the middle or at the end of a string/row, it should match only if it starts with capital letter. E.g. Abc, not abc
      4. After matching the Replacing I need is like this:
        Abc 2:5a; Abc 24:51d, 53; Abc 1:9b, 22-23c; Abc 1:22-23, 9; Abc 1:22-23, 24-25;

      This what I tried, but it is not good enough: ((((\d)?\w*(.)?\s\d*:\d*)((-|:|,)?\s)?)(((-)?|(\d*))?(,)?(\s)?){5})

      Thank you for any help.
      Dumitru.

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hello, @dumitru-s and All,

        I think it would be good to use two successive steps :

        1) Identify the lines where a replacement must occur ( String beginning with an upper-case letter followed with some lower-case letters, not at the very start of current line and followed with a space and some digit characters

        2) Replace any ; character with the same ;, followed with a space char and the first word beginning the line, after the leading space characters


        • First, I advice you to backup the file(s), involved in the future replacements ;-))

        • Then, use this first regex S/R, which re- copies the key-word at the very end of curent line after the last ; character

          • SEARCH (?-is)(?!^)(\u\l+)(?=\x20\d+).+

          • REPLACE $0\1

        Just tell me if this regex seems to be not restrictive enough and matches something else, which is unwanted !


        • Now, if this first step is OK, use this second regex S/R, below, which :

          • Adds, after any ; char, except the last one, on the appropriate lines, a space char and the first word, beginning the line after the leading space chars

          • Deletes the temporary first word, located after the last ; character

            • SEARCH (?-is);(?=.+;(\u\l+)$)|\u\l+$

            • REPLACE ?1;\x20\1

        If OK, I could give you some explanations on these regex S/R, next time !

        Best Regards,

        guy038

        Dumitru S.D 1 Reply Last reply Reply Quote 1
        • Dumitru S.D
          Dumitru S.
          last edited by

          Yes. Both steps work very well, and it was just as you said. Also, the first regex is indeed not restrictive enough – if there is another word or another sentence after that, this regex captures it all all the way until CR/LF that mark the end of the paragraph.
          Thank you!

          PeterJonesP 1 Reply Last reply Reply Quote 0
          • PeterJonesP
            PeterJones @Dumitru S.
            last edited by

            @Dumitru-S ,

            You’ll notice that @guy038 had to make guesses, and wasn’t able to guess right. Do you know why this is? Because you gave one line of text which should match, and that was it. The regex you had tried helped give insight into your difficulty – so thank you for sharing that – but that’s not enough.

            Read and take to heart the advice below. It will get you better answers in the future. The better you describe what you have and what you want (with good examples), the better our answers will be.

            ----

            Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as literal text using the </> toolbar button or manual Markdown syntax. To make regex in red (and so they keep their special characters like *), use backticks, like `^.*?blah.*?\z`. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get. Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.

            1 Reply Last reply Reply Quote 0
            • Dumitru S.D
              Dumitru S.
              last edited by

              You are right. I did not realized I was not very accurate. Thank you for your honest comment.

              By the way, I downloaded some books that I found checking out your link Regular Expression FAQ. I already started to learn from the sites provided by you, but I would really like to know it better. So, thank you.

              • As shown in the picture below, what I need is to find this:
                Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25;
                (Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25)
                (3Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25)
                3Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25;

              • and to be replaced with this:
                Abc 2:5a; Abc 24:51d, 53; Abc 1:9b, 22-23; Abc 1:22-23c, 9; Abc 1:22-23, 24-25;
                (Abc 2:5a; Abc 24:51d, 53; Abc 1:9b, 22-23; Abc 1:22-23c, 9; Abc 1:22-23, 24-25)
                (3Abc 2:5a; Abc 24:51d, 53; Abc 1:9b, 22-23; Abc 1:22-23c, 9; Abc 1:22-23, 24-25)
                3Abc 2:5a; Abc 24:51d, 53; Abc 1:9b, 22-23; Abc 1:22-23c, 9; Abc 1:22-23, 24-25;

              NP++.png

              • As @guy038 said both steps work very well, and it was just as he said. Also, the first regex is indeed not restrictive enough. As an example, you can see how it goes all the way to the end of the paragraph, as shown in the picture below:

              np++.png

              I hope this time I was much more accurate, and that I do show that I am willing to learn.

              Have a good day!
              Thank you!

              1 Reply Last reply Reply Quote 2
              • guy038G
                guy038
                last edited by guy038

                Hi, @dumitru-s, @peterjones and All,

                OK ! So, the way to achieve your problem has not changed : once the strings are well identified, with an appropriate search regex, we re-copy the key string ( Abc ), right after each string.

                However, as we cannot rely on the end of line location, we must locate this key string, in a unique way. I chose the temporary sequence @Abc@


                So, starting with this text, ( manually ! ) extracted from your picture :

                fermentum (Abc 22:51). Mauris fringilla et massa (3Ab 24:51, 53; 3Ab 24:51,53) eget congue
                frigilla, Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25; malesuada
                nisi a turpis Abc 1:9, 22-23; 3Ab 1:22-23, 9 efficitur.
                lorem vehicula, 3Abc 1:12-23, 24-25 venelactis lectus
                at ullamcorper (Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25). Integer arcu
                Abc 22:51 Lorem ipsum 
                fermentum (Abc 22:51). Mauris fringilla et massa (3Ab 24:51, 53; 3Ab 24:51,53) eget congue
                a turpis (3Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25). Praesent consectetur
                vitae quam 3Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25; at ullamcorper.
                

                I’ll express the regexes with the free-spacing mode, for a better readability. In this mode, any space char is not part of the regex ( Use \x20 or escape the space char with \, to search for a space char ! )

                However, note that this mode is not available for the replacement regex !

                So, this first regex S/R :

                • Looks for the correct strings needing modifications

                • Inserts the key word Abc, surrounded by @ chars, at the end of each string :

                SEARCH (?x-i) (?<=\x20) \(? (\d+)? ( \u\l+ ) ( \x20 \d+ : \d+ \l? (- \d+ \l? )? ( ,\x20 \d+ (- \d+)? )? [;)] ) {2,}

                REPLACE $0@\2@

                and gives this changed text :

                fermentum (Abc 22:51). Mauris fringilla et massa (3Ab 24:51, 53; 3Ab 24:51,53) eget congue
                frigilla, Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25;@Abc@ malesuada
                nisi a turpis Abc 1:9, 22-23; 3Ab 1:22-23, 9 efficitur.
                lorem vehicula, 3Abc 1:12-23, 24-25 venelactis lectus
                at ullamcorper (Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25)@Abc@. Integer arcu
                Abc 22:51 Lorem ipsum 
                fermentum (Abc 22:51). Mauris fringilla et massa (3Ab 24:51, 53; 3Ab 24:51,53) eget congue
                a turpis (3Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25)@Abc@. Praesent consectetur
                vitae quam 3Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25;@Abc@ at ullamcorper.
                

                You probably noticed that some parts like :

                • (3Ab 24:51, 53; 3Ab 24:51,53), already in its final state

                • Abc 22:51, beginning the line

                • 3Abc 1:12-23, 24-25 being a one -section string

                are correctly ignored by the search regex !


                Now, the second regex S/R :

                • Looks for any ; character, not at end of each expression, and replace it with a ; char, followed with a space char and the key word Abc

                • And, at end of each string, deletes the temporary string @Abc@

                SEARCH (?x-is) ; (?= .+ [;)] @ ( \u\l+ ) @ ) | @ .+? @

                REPLACE ?1;\x20\1

                Giving your expected text :

                fermentum (Abc 22:51). Mauris fringilla et massa (3Ab 24:51, 53; 3Ab 24:51,53) eget congue
                frigilla, Abc 2:5a; Abc 24:51d, 53; Abc 1:9b, 22-23; Abc 1:22-23c, 9; Abc 1:22-23, 24-25; malesuada
                nisi a turpis Abc 1:9, 22-23; 3Ab 1:22-23, 9 efficitur.
                lorem vehicula, 3Abc 1:12-23, 24-25 venelactis lectus
                at ullamcorper (Abc 2:5a; Abc 24:51d, 53; Abc 1:9b, 22-23; Abc 1:22-23c, 9; Abc 1:22-23, 24-25). Integer arcu
                Abc 22:51 Lorem ipsum 
                fermentum (Abc 22:51). Mauris fringilla et massa (3Ab 24:51, 53; 3Ab 24:51,53) eget congue
                a turpis (3Abc 2:5a; Abc 24:51d, 53; Abc 1:9b, 22-23; Abc 1:22-23c, 9; Abc 1:22-23, 24-25). Praesent consectetur
                vitae quam 3Abc 2:5a; Abc 24:51d, 53; Abc 1:9b, 22-23; Abc 1:22-23c, 9; Abc 1:22-23, 24-25; at ullamcorper.
                

                As usual, tick, preferably, the Wrap around option and select the Regular expression search mode

                Note also that the key-words can be different for each string ( Abc, Abcdef, 3Ztest, 456Test, … )

                Best Regards,

                guy038

                1 Reply Last reply Reply Quote 1
                • Dumitru S.D
                  Dumitru S.
                  last edited by

                  @guy038 said in Interrupted list:

                  REPLACE $0@\2@

                  Thank you @guy038!
                  I marveled when I saw that what you wrote really worked. They work as a charm and that goes to your credit and credibility. That is why I am coming back to simply point that out.

                  1. To resume, your two step regex worked nicely in two steps:
                    This 1st Step:
                    SEARCH: (?x-i) (?<=\x20) (? (\d+)? ( \u\l+ ) ( \x20 \d+ : \d+ \l? (- \d+ \l? )? ( ,\x20 \d+ (- \d+)? )? [;)] ) {2,}
                    REPLACE $0@\2@

                  made this text:
                  text before Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25;
                  text before (Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25)
                  text before (3Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25)
                  text before 3Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25;

                  into this:
                  text before Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25;@Abc@
                  text before (Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25)@Abc@
                  text before (3Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25)@Abc@
                  text before 3Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25;@Abc@


                  And this 2d Step:
                  SEARCH: (?x-is) ; (?= .+ [;)] @ ( \u\l+ ) @ ) | @ .+? @
                  REPLACE: ?1;\x20\1

                  made the (above produced) text:
                  text before Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25;@Abc@
                  text before (Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25)@Abc@
                  text before (3Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25)@Abc@
                  text before 3Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25;@Abc@

                  into this:
                  text before Abc 2:5a; Abc 24:51d, 53; Abc 1:9b, 22-23; Abc 1:22-23c, 9; Abc 1:22-23, 24-25;
                  text before (Abc 2:5a; Abc 24:51d, 53; Abc 1:9b, 22-23; Abc 1:22-23c, 9; Abc 1:22-23, 24-25)
                  text before (3Abc 2:5a; Abc 24:51d, 53; Abc 1:9b, 22-23; Abc 1:22-23c, 9; Abc 1:22-23, 24-25)
                  text before 3Abc 2:5a; Abc 24:51d, 53; Abc 1:9b, 22-23; Abc 1:22-23c, 9; Abc 1:22-23, 24-25;

                  1. Thank you @guy038 for this real masterpiece!I could never have done that. I have to study more. I really like the accurate way you work, and this pushed me also to be more accurate. I have actually never thought that I am so sloppy until now. This makes me work more on my regex cases and when I have some more time to work on it I will just come back to you soon enough. So thank you for this and have a nice day today! I go back to study some more.
                    Dumitru.
                  1 Reply Last reply Reply Quote 0
                  • Dumitru S.D
                    Dumitru S. @guy038
                    last edited by

                    Hi @guy038 !
                    In a previous post you mentioned that you might give some explanation on these regex S/R. So, I would like to ask whether you would you like to help me understand what just happened especially with the following two sets of regex that you offered me:

                    SEARCH: (?x-i) (?<=\x20) (? (\d+)? ( \u\l+ ) ( \x20 \d+ : \d+ \l? (- \d+ \l? )? ( ,\x20 \d+ (- \d+)? )? [;)] ) {2,}
                    REPLACE $0@\2@

                    SEARCH: (?x-is) ; (?= .+ [;)] @ ( \u\l+ ) @ ) | @ .+? @
                    REPLACE: ?1;\x20\1

                    This explanation of the regex above would be a real help and a true progress for me personally.
                    Thank you!

                    1 Reply Last reply Reply Quote 0
                    • Dumitru S.D
                      Dumitru S.
                      last edited by

                      Hi @guy038 and all!
                      I went and I studied all that you did above and I understood everything except \x20. So, if you do not mind to explain what it stands for, that would be wonderful. Thank you!

                      Alan KilbornA 1 Reply Last reply Reply Quote 0
                      • Alan KilbornA
                        Alan Kilborn @Dumitru S.
                        last edited by

                        @Dumitru-S

                        \x20 is another way of expressing a single space character.
                        It is often used in regular expression posts here because it is much easier to see than a space.

                        PeterJonesP 1 Reply Last reply Reply Quote 2
                        • PeterJonesP
                          PeterJones @Alan Kilborn
                          last edited by PeterJones

                          @Alan-Kilborn said in Interrupted list:

                          \x20 is another way of expressing a single space character.
                          It is often used in regular expression posts here because it is much easier to see than a space.

                          @Dumitru-S ,

                          Further to what Alan said: because the FIND expression you are using includes the (?x-i) and (?x-is) syntax, normal whitespace in the regular expression is used only for readability and will not match space characters; in order to match the space character in a (?x) expression, you have to encode the space – and, as Alan explained, \x20 is one way to encode the space character: see the description of \xℕℕ in the usermanual.

                          1 Reply Last reply Reply Quote 2
                          • guy038G
                            guy038
                            last edited by guy038

                            Hello, @dumitru-s, @peterjones, @alan-kilborn and All,

                            Sorry for my very late reply ! Anyway, the good thing is that you studied some regex formulations, in between ;-)) Thanks to Alan and Peter, you have the explanation about the \x20 syntax. When using the red Markdown marking, I very often adopt this form, instead of a true space char, for the reason below :

                            On our forum, multiples space characters are normally compressed to a simple one ! Indeed, create a new post start a reply to someone and type in abc, then hit 10 times the space bar and type in def, as below :

                            abc          def
                            

                            It corresponds to a regex, which, simply, looks for the string abc, followed by 10 space characters and the string def

                            As you can see, in the preview panel, on the right side, we simply get the string abc def. But, if I decide to use, instead, the regex abc\x20{10}def, no more ambiguity !

                            The \xNN or \x{NN} syntaxes, where each N represents an single hexadecimal digit from 0 to F can be used to search any character, with hexadecimal code NN, in an ANSI encoded file

                            In order to find any Unicode char, of the BMP ( Basic multilingual Plane ), in an Unicode encoded file, use, preferably, the \x{NNNN} syntax

                            Now, if you really want to insert a text, containing multiple consecutive space chars, use for instance, this Mardown syntax :

                            ~~~z

                            Raw        text      here,     with     respect    of      SPACE      chars
                            

                            ~~~


                            Let’s go back to the regex explanations ! If we use the ABNF syntax ( Augmented Backus–Naur Form ) ( Refer here ) with the definitions :

                            SP      =  %x20     ;  SPACE char
                            upper   =  %41-5A   ;  A-Z
                            lower   =  %61-7A   ;  a-z
                            digit   =  %30-39   ;  0-9
                            number  =  1*digit  ;  From 1 to any number
                            

                            And being aware of the general notations [X Y ... Z] which means 0*1(X Y ... Z) and (X/Y/...Z) which means X OR Y OR .... OR Z

                            Your expression (3Abc 2:5a; 24:51d, 53; 1:9b, 22-23; 1:22-23c, 9; 1:22-23, 24-25), preceded by a space char, can be expressed as :

                            SP ["("] [number] upper 1*lower 2*( SP number ":" number [lower] [ "-" number [lower] ] [ "," SP number [ "-" number ] ] ( ";" / ")" ) )

                            leading to the search expression of the first regex S/R :

                            (?x-i)                    #  FREE-SPACING mode / NON-INSENSITIVE to case
                            (?<=\x20)                 #  IF PRECEDED with a SPACE char, searches for ...
                            \(?                       #  an OPTIONAL OPENING parenthese followed with ...
                            (\d+)?                    #  an OPTIONAL GROUP 1 containing a NUMBER and ...
                            ( \u\l+ )                 #  an UPPER-CASE letter, followed with some LOWER-CASE letters, stored as GROUP 2, which will be inserted, during REPLACEMENT, between TWO AROBAS chars
                            (                         #  START of GROUP 3
                              \x20 \d+ : \d+ \l?      #    A SPACE char followed with a NUMBER, a COLON char, an other NUMBER and an OPTIONAL LOWER-CASE letter
                              ( - \d+ \l? ) ?         #    An OPTIONAL GROUP 4, containing a DASH, followed with a NUMBER and aN OPTIONAL LOWER-CASE letter
                              (                       #    START of an OPTIONAL GROUP 5, containing :
                                , \x20 \d+ (- \d+)?   #      A COMMA followed with a SPACE char, then a NUMBER followed by an OPTIONAL GROUP 6, containing a DASH char followed with a NUMBER
                              )?                      #    END of the OPTIONAL GROUP 5
                              [;)]                    #    Ending with, either, a SEMMICOLON char OR a CLOSING parenthese
                            ) {2,}                    #  End of GROUP 3, present, at LEAST, TWO times
                            

                            In replacement, we simply rewrite the overall match $0, followed with the group 2, \2, between two @ chars !


                            Now, regarding the second regex S/R, we must :

                            • Add the string, located between two @ chars, after each ;, but the last

                            • And, of course, delete the temporary string @ ••••• @, added by the first S/R, above

                            leading to this search regex :

                            (?x-is)               #  FREE-SPACING mode / NON-INSENSITIVE to case / A DOT matches a SINGLE STANDARD character ONLY
                              ;                   #    Searches for a SEMICOLON char ...
                              (?=                 #    ONLY IF :
                                .+                #      followed with the GREATEST range of chars till ...
                                [;)]              #      a LAST ; character OR a CLOSING parenthese
                                @                 #      followed with an ARROBAS and ...
                                ( \u\l+ )         #      an UPPER-CASE letter and some LOWER-CASE letters, stored as GROUP 1
                                @                 #      and followed by a LAST ARROBAS char
                              )                   #    END of the LOOK-AHEAD assertion
                            |                     #  OR
                              @ .+? @             #    searches an ARROBAS char, followed by the SMALLEST range of STANDARD characters, till ... a SECOND ARROBAS char
                            

                            In replacement :

                            • we use a conditional syntax ?1•••• which means that the searched semicolon will be replaced by itself, followed with a space char \x20 and the contents of group 1 \1 ONLY IF the group 1 exists ( i.e. if the first alternative matched ! )

                            • If the second alternative matches, as group 1 does not exist, the temporary string @•••••@ is then deleted !

                            Best Regards,

                            guy038

                            1 Reply Last reply Reply Quote 1
                            • Dumitru S.D
                              Dumitru S.
                              last edited by

                              Hi to all!

                              Thank you a lot for your very comprehensive response.
                              I appreciate it.

                              Have a nice day today!
                              Best regards,
                              Dumitru

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              The Community of users of the Notepad++ text editor.
                              Powered by NodeBB | Contributors