Regex: How to get off the connecting line from the title of a hyperlink?
-
I have several lines with this kind of hyperlinks:
<p class="mb-40px"><a href="my-name-is-prince.html">My-name-is-prince</a></p>
I want to use regex as to get off the connecting line from the title.
The Output should be:
<p class="mb-40px"><a href="my-name-is-prince.html">My name is prince</a></p>
-
I find the solution:
FIND:
(?-s)(\G(?!^)|html">)((?!</a).)*?\K[-]
REPLACE BY:
\x20
-
Hello, @hellena-crainicu and All,
You said :
I find a solution :
FIND:
(?-s)(\G(?!^)|html">)((?!</a).)*?\K[-]
REPLACE BY:
\x20
I was a bit intrigued and I tried to dig out a bit your solution
-
First, no need to place the dash between square brackets
-
Secondly, to be rigourous, it would be better to use the exact
<a href=".......">........</a>
definition and place it as the first alternative. In addition, if we use a non-capturing group with a non-insensitive modifier inside this group, this leads to this equivalent search regex :
(?s)(?-i:<a\x20href=".+?">|\G(?!^))((?!</a>).)*?\K-
- Thirdly, as you’re using the
(?-s)
modifier, this means that, after the last character of each line, as it needs to cross through theEOL
char(s) to access a next line, the\G
asssertion will not be true. So, from the beginning of each line, we’ll have to find a<a href....
definition first. In this case, it’s useless to add that the ending region is the negative look-ahead(?<!</a>)
So, your regex could be simplified as :
(?-s)(?-i:<a\x20href=".+?">|\G(?!^)).*?\K-
However, note that if you use the
(?s)
single_line modifier, you must use the look-ahead(?<!</a>)
to limit the action of your multi-lines search :(?s)(?-i:<a\x20href=".+?">|\G(?!^))((?!</a>).)*?\K-
Now, in the topic below, we already tried to normalize this kind of regex !
-
Let FR (
Find Regex
) be the regex which defines the char, string or expression to be searched -
Let RR (
Replacement Regex
) be the regex which defines the char, string or expression which must replace the FR expression -
Let BSR (
Begin Search-region Regex
) be the regex which defines the beginning of the area where the search for FR, must start -
Let ESR (
End Search-region Regex
) be the regex which defines, implicitly, the area where the search for FR, must end
Then, the generic regex can be expressed :
SEARCH
(?-i:
BSR|(?!\A)\G)(?s:(?!
ESR).)*?\K(?-i:
FR)
REPLACE RR
So I was curious to compare our previous syntax with yours, which is :
SEARCH
(?-i:
BSR|\G(?!^))(?s:(?!
ESR).)*?\K(?-i:
FR)
REPLACE RR
After some tests, I must say that your syntax
\G(?!^)
, which can also be expressed as(?!^)\G
, seems more accurate and practical than(?!\A)\G)
. Let me explain :
When you perform a
Replace All
or aMark All
operation, you simply have to tick theWrap aound
option to get the correct results / replacements !But, if you just use the
Find Next
button :-
With the
(?!\A)\G)
syntax, you need to move the caret at very beginning of file in order to get a correct match ELSE you may match some incorrect FR -
With the
(?!^)\G
syntax, you need to move to any beginning of line, in order to get a correct match. ELSE any start fromposition > 1
may match incorrect FR
In other words :
-
With the @hellena-crainicu syntax, associated to
\G
, if you are at beginning of any line, a first hit on theFind Next
button will always give you a correct match -
With our previous syntax, associated to
\G
, you must be at the very begining of file in order that a first hit on theFind Next
button gives you a correct match
To be convinced :
-
Select the
Mark
dialog (Ctrl + M
) -
Untick the
Wrap around
option ( IMPORTANT ) -
Tick the
Purge for each search
option -
Move the caret at beginning or not of the first line or the subsequent lines ( a FR part must be present in some lines to see the differences ! )
-
For each case, note all the matches after a click on the
Mark All
button, for both methods :
(?-i:
BSR|(?!\A)\G)(?s:(?!
ESR).)*?\K(?-i:
FR)
and(?-i:
BSR|(?!^)\G)(?s:(?!
ESR).)*?\K(?-i:
FR)
Best Regards,
guy038
-
-
@guy038 THANKS
-
Hi, @hellena-crainicu and All,
Let me expand on my previous post. Here is a real example, based on the @hellena-crainicu problem !
In this example, I supposed that @hellena-crainicu wanted to search for any dash symbol, contained in the
•
region of the tag
<a href="...........">••••••••••••••</a>
, in a multi-lines text, so using the(?s)
single_line modifier.In a new tab, paste the
23
- lines text, below :This-is -- a- test <p class="mb-40px"><a href="my-na me-is-prince.html">My- name - is---pr ince</a></p> <p class="mb-40px"><a href=" my-name-is-prince. html">M y-name -- is-prince </a></p>
Now, we must detect the differences between the two regexes :
- Regex A :
(?s)(?-i:<a\x20href=".+?">|\G(?!\A))((?!</a>).)*?\K-
( The used syntax, up to now )
and
- Regex B :
(?s)(?-i:<a\x20href=".+?">|\G(?!^))((?!</a>).)*?\K-
( The @hellena-crainicu’s syntax )
-
Open the Mark dialog (
Ctrl + M
) -
Untick all options
-
Tick the
Purge fore each search
ANDWrap around
options -
Select the
Regular epression
search mode -
Click on the
Mark All
button
=> Message
Mark: 9 matches in entire file
, corresponding to the9
dashes between the>
and</a>
, in the two multi-lines beginning with<p class
. This is correct !In the same way, if the
Wrap around
is ticked, a replacement of each dash by a space char would correctly give the messageReplace All: 9 occurrences were replaced in entire file
Now, let’s see the differences when using the
Mark
dialog, with theWrap aound
option unticked and thePurge for each search
still tickedHere is, below, some results depending on the caret’s position ( Line
x
, columny
), right before a click on theMark All
button :•--------------------•--------------------•--------------------•--------------•--------------•----------------------------------• | Caret position | Regex A | Regex B | Regex A | Regex B | Observations | •--------------------•--------------------•--------------------•--------------•--------------•----------------------------------• | Line 1, column 1 | 9 matches | 9 matches | OK | OK | Beginning of **file** and line | | Line 1, column 2 | 17 matches | 17 matches | ko | ko | | •--------------------•--------------------•--------------------•--------------•--------------•----------------------------------• | Line 2, column 1 | 16 matches | 9 matches | ko | OK | Beginning of line | | Line 2, column 2 | 15 matches | 15 matches | ko | ko | | •--------------------•--------------------•--------------------•--------------•--------------•----------------------------------• | Line 3, column 1 | 14 matches | 9 matches | ko | OK | Beginning of **empty** line | •--------------------•--------------------•--------------------•--------------•--------------•----------------------------------• | Line 4, column 1 | 14 matches | 9 matches | ko | OK | Beginning of line | | Line 4, column 2 | 14 matches | 14 matches | ko | ko | | •--------------------•--------------------•--------------------•--------------•--------------•----------------------------------• | Line 5, column 1 | 13 matches | 9 matches | ko | OK | Beginning of line | | Line 5, column 2 | 13 matches | 13 matches | ko | ko | | •--------------------•--------------------•--------------------•--------------•--------------•----------------------------------• | Line 6, column 1 | 13 matches | 9 matches | ko | OK | Beginnin of **empty** line | •--------------------•--------------------•--------------------•--------------•--------------•----------------------------------• | Line 7, column 1 | 13 matches | 9 matches | ko | OK | Beginning of line | | Line 7, column 2 | 13 matches | 13 matches | ko | ko | | •--------------------•--------------------•--------------------•--------------•--------------•----------------------------------•
Note that the exact message is :
Mark: xx matches from caret to end-of-file
It’s easy to notice that the @hellena-crainicu syntax ( Regex
B
) gives more correct results than the previous one ( RegexA
), when theWrap aound
option is not checked ;-))Best Regards
guy038
- Regex A :
-
Hi, @hellena-crainicu and All,
I did additional tests and, sorry Hellena, but using your negative look-ahead
(?!^)
, instead of(?!\A)
, may miss matches in some cases, too !Indeed, imagine that the searched string would just be the
EOL
char(s) with the following regex :SEARCH
(?s)(?-i:<a\x20href=".+?">|\G(?!^))((?!</a>).)*?\K\R
Then, the part
\G(?!^)((?!</a>).)*?
, before a next match of line-ending chars, would never occur, as the range, after\G
, should start at beginning of line which is just forbidden due to the\G(?!^)
syntax !
Finally, the present
(?!\A)
syntax is preferable. We do not even need to bother about the status of theWrap around
option. Just ONE rule :- Move at the very beginning of current file, with the
Ctrl + Home
shortcut, before applying this specific S/R !
You may test the regex :
(?s)(?-i:<a\x20href=".+?">|\G(?!\A))((?!</a>).)*?\K\R
( and your version(?s)(?-i:<a\x20href=".+?">|\G(?!^))((?!</a>).)*?\K\R
)Against the
23
-lines text of my previous post to see the obvious differences !BR
guy038
P.S. I’m about to send an e-mail to @peterjones to know where this specific S/R should be placed. Probably, at this location :
- Move at the very beginning of current file, with the
-
@guy038 I use https://chat.openai.com/ to find different solution. ChatGPT learns everything. In about 5 seconds generates another 4 solutions.
I just put your regex as an example, and I ask ChatGPT to write me another 4 solution. Is the most inteligent tood ever. Artificial Inteligent.
:
Căutare:(?-s)(\G(?!^)|html">)((?!</a>).)*?\K-
Înlocuire:\x20
Căutare:
(?-s)(\G(?!^)|html">)((?!</a>).)*?\K-
Înlocuire:\x20
Căutare:
(?-s)(\G(?!^)|html">)((?!</a>).)*?\K-
Înlocuire:\x20
Căutare:
(?-s)(\G(?!^)|html">)((?!</a>).)*?\K-
Înlocuire:\x20