Community
    • Login

    RegEx Quandary

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    12 Posts 4 Posters 360 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Roy SimpsonR
      Roy Simpson
      last edited by

      Hello everyone.

      I have an HTML document that contains the following:

      For a function \(  g(x)=f(x)+k,  g(x)=f(x)+k,\) the function \(  f( x )  f( x )\) is shifted vertically \(  k  k\) units.
      

      Due to the importing process, all the inline math expressions are double their length (see the repeated text within ( ) ).

      I am new to RegEx, but I am trying the following:

      Find:

      \\(\s*(.{1,}?)(.{1,})\s*\\)
      
      

      Replace:

      \\(\1\\)
      

      The intent is to only return the first half of the text inside ( ).

      For some reason, the Find and Replace is highlighting all text from the first occurrence of ( to the final occurrence of ) on a line. What am I missing?

      Thanks in advance!

      Roy SimpsonR Terry RT 2 Replies Last reply Reply Quote 1
      • Roy SimpsonR
        Roy Simpson @Roy Simpson
        last edited by

        After a few more moments of searching the interwebs, I don’t think this is possible via RegEx alone. I would need an automated way to split a string in half (which I believe RegEx is not capable of computing the length of a string).

        Terry RT 1 Reply Last reply Reply Quote 0
        • Terry RT
          Terry R @Roy Simpson
          last edited by Terry R

          @Roy-Simpson said in RegEx Quandary:

          I don’t think this is possible via RegEx alone

          Well it can be done with Regex.
          So far I have Find What:\\\(\s*([^\s,]+),\s*\1
          I haven’t presented a finished solution partly due to not being sure about the number of spaces before and the comma afterwards, whether they are exactly duplicated or not.

          Anyway, use this and proceed from there. The group 1 is the first formula, the reference to \1 means to find it when duplicated.

          Terry

          PS I should also add that the reason your regex isn’t working is because you are using the DOT character which is to generic. Note my regex uses a class which encompasses the space and comma, but is a negated class, so any character but these 2. The issue here might be if your formula uses either within the actual formula code. This is just one method, another might be to gather the characters only until just in front is a comma and space combination, it’s called a lookahead.

          Roy SimpsonR 2 Replies Last reply Reply Quote 1
          • guy038G
            guy038
            last edited by guy038

            Hello, @roy-simpson, @terry-r and All,

            As Terry said, it can be achieved with regular expressions. Here is my version, assuming there are ALWAYS two space characters between the two identical parts


            Given your INPUT text :

            For a function \(  g(x)=f(x)+k,  g(x)=f(x)+k,\) the function \(  f( x )  f( x )\) is shifted vertically \(  k  k\) units.
            

            With this regex S/R :

            FIND (?-s)(.+) \1

            REPLACE $1\x20

            It would produce this OUTPUT text :

            For a function \(  g(x)=f(x)+k, \) the function \(  f( x ) \) is shifted vertically \(  k \) units.
            

            May be, you don’t even need the trailing space ? If so, simply use $1 in replacement

            Best Regards,

            guy038

            1 Reply Last reply Reply Quote 2
            • Terry RT
              Terry R @Roy Simpson
              last edited by

              @Roy-Simpson said in RegEx Quandary:

              What am I missing?

              As you will see from the 2 solutions presented thus far it is possible. Note that they are similar, but the actual reason they can identify the duplication is due to the following \1 in each regex. I will suggesting reading one of the FAQ posts, namely this one. There are lots of links to regex info, including the site I mention below.

              As an interesting exercise I loaded your example line and each of the regex solutions into regex101.com. This site will check regexes and often provide valuable descriptions of the regex. The site will also note how many steps it took to identify the string required. @guy038 solution matched in 9374 steps, however it found 4 matches on that line (includes the wording after the duplicated formula). My solution found 1 match in 10 steps. I only concentrated on the actual function, not the comment afterwards, was I wrong?

              You say you have just started learning regex, well good on you for giving it a go. There is lots to learn, hopefully you stick with it. Each time you try, but maybe fall at the last step and then post like in this situation hopefully those who have gone before you will give you a hand up as we have done.

              Terry

              1 Reply Last reply Reply Quote 1
              • guy038G
                guy038
                last edited by guy038

                Hi @roy-simpson, @terry-r and All,

                @terry-r you said :

                The site will also note how many steps it took to identify the string required. @guy038 solution matched in 9374 steps, however it found 4 matches on that line

                Personally, I found 3 matches only :

                • g(x)=f(x)+k, g(x)=f(x)+k

                • f( x ) f( x )

                • k  k ( I changed the space by a NBSP char in order to put two consecutive spaces between the two letters k

                And where did you find this interesting feature which counts the number of steps to get a match ?!

                BR

                guy038

                P.S. :

                Don’t bother anymore, Terry ! I understood :

                f734bcb4-d6b9-4867-8f45-a446ee1f140b-image.png

                1 Reply Last reply Reply Quote 0
                • Roy SimpsonR
                  Roy Simpson @Terry R
                  last edited by

                  @Terry-R Thank you for the reply on this. Unfortunately, any character is possible within the inline math code (including /, *, , ^, etc.).

                  1 Reply Last reply Reply Quote 0
                  • Roy SimpsonR
                    Roy Simpson
                    last edited by

                    @guy038 said in RegEx Quandary:

                    (?-s)(.+) \1

                    You both have been so great and this is definitely a steep learning curve for me. What you are seeing (the math code) is the result of my efforts using RegEx to convert

                    <math display="inline"><semantics><mrow> <mrow> <mtable columnalign="left"> <mtr columnalign="left"> <mtd columnalign="left"> <mrow> <mn>0</mn><mo>=</mo><mo>|</mo><mn>4</mn><mi>x</mi><mo>+</mo><mn>1</mn><mo>|</mo><mo>−</mo><mn>7</mn> </mrow> </mtd> </mtr></mtable></mrow></mrow><annotation-xml encoding="MathML-Content"><mrow> <mtable columnalign="left"> <mtr columnalign="left"> <mtd columnalign="left"> <mrow> <mn>0</mn><mo>=</mo><mo>|</mo><mn>4</mn><mi>x</mi><mo>+</mo><mn>1</mn><mo>|</mo><mo>−</mo><mn>7</mn> </mrow> </mtd> </mtr></mtable></mrow></annotation-xml></semantics></math>
                    

                    To

                    \( 0 = |4x + 1 | - 1 \)
                    

                    Right now, it’s down to

                    \( 0 = |4x + 1 | - 1  0 = |4x + 1| - 1 \)
                    

                    I would say it’s a huge win so far.

                    1 Reply Last reply Reply Quote 0
                    • Roy SimpsonR
                      Roy Simpson
                      last edited by Roy Simpson

                      Okay, so now things are getting interesting. Just for a basic pattern match, I used

                      \\((.*?)\\)
                      

                      This is just to select the stuff between \( and \); however, that’s not even working correctly.

                      PeterJonesP 1 Reply Last reply Reply Quote 0
                      • PeterJonesP
                        PeterJones @Roy Simpson
                        last edited by PeterJones

                        @Roy-Simpson said in RegEx Quandary:

                        \\((.*?)\\)
                        
                        \\((.*?)\\)
                        ^^ matches literal \
                        
                        \\((.*?)\\)
                          ^ starts a group
                           ^ starts another group
                        

                        You need to escape the literal ( and ) as well, otherwise they will be interpreted as a regex group, not literal. Hence, you need \\\((.*?)\\\) to match the literal backslash-paren thru backslash-close-paren in your example text:
                        1ee69779-2996-4029-ba12-7cfcf281f9c3-image.png

                        If you don’t want the backslash-paren in your match, wrap those prefix and suffix in positive lookbehinds and lookaheads, like (?<=\\\()(.*?)(?=\\\)):
                        1c10515d-41b8-4188-9040-5ab0ed9f6e68-image.png

                        As a hint, when doing regex for matching backslashes and parens, I like using \x5C for backslash and \x28 and \x29 for the parens, so I don’t confuse the literals with the regex specials, so \x5C\x28(.*?)\x5C\x29 or (?<=\x5C\x28)(.*?)(?=\x5C\x29)

                        1 Reply Last reply Reply Quote 2
                        • Roy SimpsonR
                          Roy Simpson @Terry R
                          last edited by

                          @PeterJones said in RegEx Quandary:

                          (?<=\()(.*?)(?=\))

                          OMG!!!

                          I knew it was something stupid on my part. I totally wasn’t counting those backslashes (/) properly (funny thing for a math professor).

                          Here is what I ended with:

                          FIND

                          \\\(\s*(?-s)([^\s].*?)\s*\1\s*\\\)
                          

                          REPLACE

                          \\\(\1\\\)
                          

                          THANK YOU SO MUCH!!! Have a wonderful New Year.

                          1 Reply Last reply Reply Quote 0
                          • guy038G
                            guy038
                            last edited by guy038

                            Hello, @roy-simpson, @terry-r, @peterjones and All,

                            @roy-simpson, may be the following regexes may help you to achieve your goal !

                            For the example below, don’t forget to tick the Wrap around option in the Replace dialog !


                            Given your last single line INPUT text, that you’ll paste in a new tab :

                            <math display="inline"><semantics><mrow> <mrow> <mtable columnalign="left"> <mtr columnalign="left"> <mtd columnalign="left"> <mrow> <mn>0</mn><mo>=</mo><mo>|</mo><mn>4</mn><mi>x</mi><mo>+</mo><mn>1</mn><mo>|</mo><mo>−</mo><mn>7</mn> </mrow> </mtd> </mtr></mtable></mrow></mrow><annotation-xml encoding="MathML-Content"><mrow> <mtable columnalign="left"> <mtr columnalign="left"> <mtd columnalign="left"> <mrow> <mn>0</mn><mo>=</mo><mo>|</mo><mn>4</mn><mi>x</mi><mo>+</mo><mn>1</mn><mo>|</mo><mo>−</mo><mn>7</mn> </mrow> </mtd> </mtr></mtable></mrow></annotation-xml></semantics></math>
                            

                            With this regex S/R :

                            FIND (?-s)(?:<.+?>)+([^< \r\n]|\Z)

                            REPLACE $1

                            You would be left with this tiny OUTPUT text :

                            0=|4x+1|−70=|4x+1|−7
                            

                            Now, with this simple regex S/R :

                            FIND .

                            REPLACE $0\x20

                            The OUTPUT becomes :

                            0 = | 4 x + 1 | − 7 0 = | 4 x + 1 | − 7 
                            

                            And finally, with this third and last regex S/R :

                            FIND (.+)\1

                            REPLACE \\\( $1\\\)

                            You get your expected OUTPUT :

                            \( 0 = | 4 x + 1 | − 7 \)
                            

                            May be, get rid of the space char, between 4 and x, to simulate an implicit multiplication sign !

                            Happy New Year !

                            Best Regards,

                            guy038

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors