Add a space between a word and a hyphen stuck to its right side, as well as skip such instances in other parts
-
Block of text for testing:-
<html lang="en"> <head> <meta http-equiv="Content- Type" content="text/html; charset=utf-8" /> <meta http-equiv="X-UA-Compatible" content="IE=edge" /> <META name="viewport" content="width=device-width, initial-scale=1" /> <h1>BOTHROPS</h1> <p style="color: black; font-family: Verdana,sans-serif; font-size: 18px; font-style: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; display: inline ! important; float: none;">BOTHROPS LANCEOLATUS uses [Both-l uses]</p> Haemorrhages- dark Fear- of death E-mail us <h6>Remedies A- Z</h6> <ul> Some- list- here Dunking- donuts Seventytwo- houris </ul> <style type="text/css"> @media (min- width: 1281px) { .left { width: 180px; border-width:1px; border-style:solid; border-color:lightblue; padding-top:10px; } .right { width: 560px; border- width:1px; border- style:solid; border- color:lightblue; margin- top:0px; } } </style> <script type="text/javascript"> function googleTranslateElementInit() { new google.translate.TranslateElement({pageLanguage: 'en'}, 'google- translate- element'); } </script>
I tried
(<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<[\S\s]*?>)(*SKIP)(*F)|(E-mail)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(A-Z)(*SKIP)(*F)|(<ul.*?<\/ul>)(*SKIP)(*F)|(2009-2024)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|(\w+)-(\x20\w+)
with$1 - $2
in the replace field to no avail -
How to add a space between a word and a hyphen stuck to its right side, as well as skip such instances in other parts :-
The resultant output should beHaemorrhages - dark Fear - of death
-
@dr-ramaanand said in Add a space between a word and a hyphen stuck to its right side, as well as skip such instances in other parts:
I tried
(<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<[\S\s]*?>)(*SKIP)(*F)|(E-mail)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(A-Z)(*SKIP)(*F)|(<ul.*?<\/ul>)(*SKIP)(*F)|(2009-2024)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|(\w+)-(\x20\w+)
with$1 - $2
in the replace field to no availTwo obvious things:
(<[\S\s]*?>)(*SKIP)(*F)
in your exclusions always matches everything to the end of the document and then fails, so it excludes everything. Take that out.You have a lot of capturing groups, so
$1 - $2
isn’t going to work. Less troublesome would be to replace(\w+)-(\x20\w+)
with(?<=\w)-(?=\s)
; then you can replace withx20-
and not worry about capture groups at all.Also, some tests won’t work unless . matches newline is checked, or you add
(?s)
to the beginning.This:
Find what:
(?s)(<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(<ul.*?<\/ul>)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|(?<=\w)-(?=\s)
Replace with:
\x20-
works on your test data.
-
@Coises Thanks a lot. I also got two more solutions from someone at www.regex101.com which is to use a Regular expression.
One solution was to use this in the Find field:-(?x)(<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<[\S\s]*?>)(*SKIP)(*F)|(E-mail)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(A-Z)(*SKIP)(*F)|(<a\s[^>]*href.*?<\/a>)(*SKIP)(*F)|(2009-2024)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|(\w+)-\x20\b
with
$11 - $12
in the Replace fieldAnother was to use this in the Find field:-
(?x)(<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<[\S\s]*?>)(*SKIP)(*F)|(E-mail)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(A-Z)(*SKIP)(*F)|(<a\s[^>]*href.*?<\/a>)(*SKIP)(*F)|(2009-2024)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|\w+\K-\x20\b
with
-
in the Replace field -
@Coises I am posting those solutions here so that someone may find it useful, later (since this webpage can be found online)
-
Warning note: Wherever the RegExes, that is, regular expressions mentioned above did not find anything, it replaced everything with what was typed in the, “Replace” field. I therefore restored everything from a back-up, added, “Czeslawski- Lewinski” in a part that was not skipped while searching and made the replacements; I then removed the, “Czeslawski- Lewinski”. I chose those words (Polish-American names actually) because they are unique
-
Hello, @dr-ramaanand, @coises and All,
I tried to simplify the @coises search regex and I ended up with this search regex :
(?s-i)(<(.+?)[> ].*?(?:/>|</\2>))(*SKIP)(*F)|(?-s).+\R
So, given your INPUT text :
<html lang="en"> <head> <meta http-equiv="Content- Type" content="text/html; charset=utf-8" /> <meta http-equiv="X-UA-Compatible" content="IE=edge" /> <META name="viewport" content="width=device-width, initial-scale=1" /> <h1>BOTHROPS</h1> <p style="color: black; font-family: Verdana,sans-serif; font-size: 18px; font-style: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; display: inline ! important; float: none;">BOTHROPS LANCEOLATUS uses [Both-l uses]</p> Haemor- rhages- dark Fear- of death E-mail us <h6>Remedies A- Z</h6> <ul> Some- list- here Dunking- donuts Seventytwo- houris </ul> <style type="text/css"> @media (min- width: 1281px) { .left { width: 180px; border-width:1px; border-style:solid; border-color:lightblue; padding-top:10px; } .right { width: 560px; border- width:1px; border- style:solid; border- color:lightblue; margin- top:0px; } } </style> <script type="text/javascript"> function googleTranslateElementInit() { new google.translate.TranslateElement({pageLanguage: 'en'}, 'google- translate- element'); } </script>
This regex just matches the three consecutive lines, below :
Haemor- rhages- dark Fear- of death E-mail us
Note that I deliberately added an other string
r-
, followed with aspace
character, for tests !
Thus, the following regex S/R :
SEARCH
(?s-i)(<(.+?)[> ].*?(?:/>|</\2>))(*SKIP)(*F)|(?<=\w)-(?=\x20)
REPLACE
\x20-
Will replace, in these three lines ONLY, any string
letter-
, followed with aspace
char, with the stringletter -
and aspace
charBest Regards,
guy038