RegEx Quandary
-
Hello everyone.
I have an HTML document that contains the following:
For a function \( g(x)=f(x)+k, g(x)=f(x)+k,\) the function \( f( x ) f( x )\) is shifted vertically \( k k\) units.
Due to the importing process, all the inline math expressions are double their length (see the repeated text within ( ) ).
I am new to RegEx, but I am trying the following:
Find:
\\(\s*(.{1,}?)(.{1,})\s*\\)
Replace:
\\(\1\\)
The intent is to only return the first half of the text inside ( ).
For some reason, the Find and Replace is highlighting all text from the first occurrence of ( to the final occurrence of ) on a line. What am I missing?
Thanks in advance!
-
After a few more moments of searching the interwebs, I don’t think this is possible via RegEx alone. I would need an automated way to split a string in half (which I believe RegEx is not capable of computing the length of a string).
-
@Roy-Simpson said in RegEx Quandary:
I don’t think this is possible via RegEx alone
Well it can be done with Regex.
So far I have Find What:\\\(\s*([^\s,]+),\s*\1
I haven’t presented a finished solution partly due to not being sure about the number of spaces before and the comma afterwards, whether they are exactly duplicated or not.Anyway, use this and proceed from there. The group 1 is the first formula, the reference to
\1
means to find it when duplicated.Terry
PS I should also add that the reason your regex isn’t working is because you are using the DOT character which is to generic. Note my regex uses a class which encompasses the space and comma, but is a negated class, so any character but these 2. The issue here might be if your formula uses either within the actual formula code. This is just one method, another might be to gather the characters only until just in front is a comma and space combination, it’s called a lookahead.
-
Hello, @roy-simpson, @terry-r and All,
As Terry said, it can be achieved with regular expressions. Here is my version, assuming there are ALWAYS two
space
characters between the two identical parts
Given your INPUT text :
For a function \( g(x)=f(x)+k, g(x)=f(x)+k,\) the function \( f( x ) f( x )\) is shifted vertically \( k k\) units.
With this regex S/R :
FIND
(?-s)(.+) \1
REPLACE
$1\x20
It would produce this OUTPUT text :
For a function \( g(x)=f(x)+k, \) the function \( f( x ) \) is shifted vertically \( k \) units.
May be, you don’t even need the trailing space ? If so, simply use
$1
in replacementBest Regards,
guy038
-
@Roy-Simpson said in RegEx Quandary:
What am I missing?
As you will see from the 2 solutions presented thus far it is possible. Note that they are similar, but the actual reason they can identify the duplication is due to the following \1 in each regex. I will suggesting reading one of the FAQ posts, namely this one. There are lots of links to regex info, including the site I mention below.
As an interesting exercise I loaded your example line and each of the regex solutions into regex101.com. This site will check regexes and often provide valuable descriptions of the regex. The site will also note how many steps it took to identify the string required. @guy038 solution matched in 9374 steps, however it found 4 matches on that line (includes the wording after the duplicated formula). My solution found 1 match in 10 steps. I only concentrated on the actual function, not the comment afterwards, was I wrong?
You say you have just started learning regex, well good on you for giving it a go. There is lots to learn, hopefully you stick with it. Each time you try, but maybe fall at the last step and then post like in this situation hopefully those who have gone before you will give you a hand up as we have done.
Terry
-
Hi @roy-simpson, @terry-r and All,
@terry-r you said :
The site will also note how many steps it took to identify the string required. @guy038 solution matched in 9374 steps, however it found 4 matches on that line
Personally, I found
3
matches only :-
g(x)=f(x)+k, g(x)=f(x)+k
-
f( x ) f( x )
-
k k
( I changed the space by a NBSP char in order to put two consecutive spaces between the two lettersk
And where did you find this interesting feature which counts the number of steps to get a match ?!
BR
guy038
P.S. :
Don’t bother anymore, Terry ! I understood :
-
-
@Terry-R Thank you for the reply on this. Unfortunately, any character is possible within the inline math code (including /, *, , ^, etc.).
-
@guy038 said in RegEx Quandary:
(?-s)(.+) \1
You both have been so great and this is definitely a steep learning curve for me. What you are seeing (the math code) is the result of my efforts using RegEx to convert
<math display="inline"><semantics><mrow> <mrow> <mtable columnalign="left"> <mtr columnalign="left"> <mtd columnalign="left"> <mrow> <mn>0</mn><mo>=</mo><mo>|</mo><mn>4</mn><mi>x</mi><mo>+</mo><mn>1</mn><mo>|</mo><mo>−</mo><mn>7</mn> </mrow> </mtd> </mtr></mtable></mrow></mrow><annotation-xml encoding="MathML-Content"><mrow> <mtable columnalign="left"> <mtr columnalign="left"> <mtd columnalign="left"> <mrow> <mn>0</mn><mo>=</mo><mo>|</mo><mn>4</mn><mi>x</mi><mo>+</mo><mn>1</mn><mo>|</mo><mo>−</mo><mn>7</mn> </mrow> </mtd> </mtr></mtable></mrow></annotation-xml></semantics></math>
To
\( 0 = |4x + 1 | - 1 \)
Right now, it’s down to
\( 0 = |4x + 1 | - 1 0 = |4x + 1| - 1 \)
I would say it’s a huge win so far.
-
Okay, so now things are getting interesting. Just for a basic pattern match, I used
\\((.*?)\\)
This is just to select the stuff between \( and \); however, that’s not even working correctly.
-
@Roy-Simpson said in RegEx Quandary:
\\((.*?)\\)
\\((.*?)\\) ^^ matches literal \ \\((.*?)\\) ^ starts a group ^ starts another group
You need to escape the literal
(
and)
as well, otherwise they will be interpreted as a regex group, not literal. Hence, you need\\\((.*?)\\\)
to match the literal backslash-paren thru backslash-close-paren in your example text:
If you don’t want the backslash-paren in your match, wrap those prefix and suffix in positive lookbehinds and lookaheads, like
(?<=\\\()(.*?)(?=\\\))
:
As a hint, when doing regex for matching backslashes and parens, I like using
\x5C
for backslash and\x28
and\x29
for the parens, so I don’t confuse the literals with the regex specials, so\x5C\x28(.*?)\x5C\x29
or(?<=\x5C\x28)(.*?)(?=\x5C\x29)
-
@PeterJones said in RegEx Quandary:
(?<=\()(.*?)(?=\))
OMG!!!
I knew it was something stupid on my part. I totally wasn’t counting those backslashes (/) properly (funny thing for a math professor).
Here is what I ended with:
FIND
\\\(\s*(?-s)([^\s].*?)\s*\1\s*\\\)
REPLACE
\\\(\1\\\)
THANK YOU SO MUCH!!! Have a wonderful New Year.
-
Hello, @roy-simpson, @terry-r, @peterjones and All,
@roy-simpson, may be the following regexes may help you to achieve your goal !
For the example below, don’t forget to tick the
Wrap around
option in the Replace dialog !
Given your last single line INPUT text, that you’ll paste in a new tab :
<math display="inline"><semantics><mrow> <mrow> <mtable columnalign="left"> <mtr columnalign="left"> <mtd columnalign="left"> <mrow> <mn>0</mn><mo>=</mo><mo>|</mo><mn>4</mn><mi>x</mi><mo>+</mo><mn>1</mn><mo>|</mo><mo>−</mo><mn>7</mn> </mrow> </mtd> </mtr></mtable></mrow></mrow><annotation-xml encoding="MathML-Content"><mrow> <mtable columnalign="left"> <mtr columnalign="left"> <mtd columnalign="left"> <mrow> <mn>0</mn><mo>=</mo><mo>|</mo><mn>4</mn><mi>x</mi><mo>+</mo><mn>1</mn><mo>|</mo><mo>−</mo><mn>7</mn> </mrow> </mtd> </mtr></mtable></mrow></annotation-xml></semantics></math>
With this regex S/R :
FIND
(?-s)(?:<.+?>)+([^< \r\n]|\Z)
REPLACE
$1
You would be left with this tiny OUTPUT text :
0=|4x+1|−70=|4x+1|−7
Now, with this simple regex S/R :
FIND
.
REPLACE
$0\x20
The OUTPUT becomes :
0 = | 4 x + 1 | − 7 0 = | 4 x + 1 | − 7
And finally, with this third and last regex S/R :
FIND
(.+)\1
REPLACE
\\\( $1\\\)
You get your expected OUTPUT :
\( 0 = | 4 x + 1 | − 7 \)
May be, get rid of the space char, between
4
andx
, to simulate an implicit multiplication sign !Happy New Year !
Best Regards,
guy038