Add a space between a word and a hyphen stuck to its right side, as well as skip such instances in other parts
-
Block of text for testing:-
<html lang="en"> <head> <meta http-equiv="Content- Type" content="text/html; charset=utf-8" /> <meta http-equiv="X-UA-Compatible" content="IE=edge" /> <META name="viewport" content="width=device-width, initial-scale=1" /> <h1>BOTHROPS</h1> <p style="color: black; font-family: Verdana,sans-serif; font-size: 18px; font-style: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; display: inline ! important; float: none;">BOTHROPS LANCEOLATUS uses [Both-l uses]</p> Haemorrhages- dark Fear- of death E-mail us <h6>Remedies A- Z</h6> <ul> Some- list- here Dunking- donuts Seventytwo- houris </ul> <style type="text/css"> @media (min- width: 1281px) { .left { width: 180px; border-width:1px; border-style:solid; border-color:lightblue; padding-top:10px; } .right { width: 560px; border- width:1px; border- style:solid; border- color:lightblue; margin- top:0px; } } </style> <script type="text/javascript"> function googleTranslateElementInit() { new google.translate.TranslateElement({pageLanguage: 'en'}, 'google- translate- element'); } </script>I tried
(<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<[\S\s]*?>)(*SKIP)(*F)|(E-mail)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(A-Z)(*SKIP)(*F)|(<ul.*?<\/ul>)(*SKIP)(*F)|(2009-2024)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|(\w+)-(\x20\w+)with$1 - $2in the replace field to no avail -
How to add a space between a word and a hyphen stuck to its right side, as well as skip such instances in other parts :-
The resultant output should beHaemorrhages - dark Fear - of death -
@dr-ramaanand said in Add a space between a word and a hyphen stuck to its right side, as well as skip such instances in other parts:
I tried
(<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<[\S\s]*?>)(*SKIP)(*F)|(E-mail)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(A-Z)(*SKIP)(*F)|(<ul.*?<\/ul>)(*SKIP)(*F)|(2009-2024)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|(\w+)-(\x20\w+)with$1 - $2in the replace field to no availTwo obvious things:
(<[\S\s]*?>)(*SKIP)(*F)in your exclusions always matches everything to the end of the document and then fails, so it excludes everything. Take that out.You have a lot of capturing groups, so
$1 - $2isn’t going to work. Less troublesome would be to replace(\w+)-(\x20\w+)with(?<=\w)-(?=\s); then you can replace withx20-and not worry about capture groups at all.Also, some tests won’t work unless . matches newline is checked, or you add
(?s)to the beginning.This:
Find what:
(?s)(<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(<ul.*?<\/ul>)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|(?<=\w)-(?=\s)Replace with:
\x20-works on your test data.
-
@Coises Thanks a lot. I also got two more solutions from someone at www.regex101.com which is to use a Regular expression.
One solution was to use this in the Find field:-(?x)(<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<[\S\s]*?>)(*SKIP)(*F)|(E-mail)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(A-Z)(*SKIP)(*F)|(<a\s[^>]*href.*?<\/a>)(*SKIP)(*F)|(2009-2024)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|(\w+)-\x20\bwith
$11 - $12in the Replace fieldAnother was to use this in the Find field:-
(?x)(<html[\S\s]*?<\/h1>)(*SKIP)(*F)|(<p[^>]*>[\S\s]*?uses\]<\/p>)(*SKIP)(*F)|(<[\S\s]*?>)(*SKIP)(*F)|(E-mail)(*SKIP)(*F)|(<h6[^<>]*>.*?<\/h6>)(*SKIP)(*F)|(A-Z)(*SKIP)(*F)|(<a\s[^>]*href.*?<\/a>)(*SKIP)(*F)|(2009-2024)(*SKIP)(*F)|(<style[\S\s]*?<\/style>)(*SKIP)(*F)|(<script[\S\s]*?<\/script>)(*SKIP)(*F)|\w+\K-\x20\bwith
-in the Replace field -
@Coises I am posting those solutions here so that someone may find it useful, later (since this webpage can be found online)
-
Warning note: Wherever the RegExes, that is, regular expressions mentioned above did not find anything, it replaced everything with what was typed in the, “Replace” field. I therefore restored everything from a back-up, added, “Czeslawski- Lewinski” in a part that was not skipped while searching and made the replacements; I then removed the, “Czeslawski- Lewinski”. I chose those words (Polish-American names actually) because they are unique
-
Hello, @dr-ramaanand, @coises and All,
I tried to simplify the @coises search regex and I ended up with this search regex :
(?s-i)(<(.+?)[> ].*?(?:/>|</\2>))(*SKIP)(*F)|(?-s).+\RSo, given your INPUT text :
<html lang="en"> <head> <meta http-equiv="Content- Type" content="text/html; charset=utf-8" /> <meta http-equiv="X-UA-Compatible" content="IE=edge" /> <META name="viewport" content="width=device-width, initial-scale=1" /> <h1>BOTHROPS</h1> <p style="color: black; font-family: Verdana,sans-serif; font-size: 18px; font-style: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; display: inline ! important; float: none;">BOTHROPS LANCEOLATUS uses [Both-l uses]</p> Haemor- rhages- dark Fear- of death E-mail us <h6>Remedies A- Z</h6> <ul> Some- list- here Dunking- donuts Seventytwo- houris </ul> <style type="text/css"> @media (min- width: 1281px) { .left { width: 180px; border-width:1px; border-style:solid; border-color:lightblue; padding-top:10px; } .right { width: 560px; border- width:1px; border- style:solid; border- color:lightblue; margin- top:0px; } } </style> <script type="text/javascript"> function googleTranslateElementInit() { new google.translate.TranslateElement({pageLanguage: 'en'}, 'google- translate- element'); } </script>This regex just matches the three consecutive lines, below :
Haemor- rhages- dark Fear- of death E-mail usNote that I deliberately added an other string
r-, followed with aspacecharacter, for tests !
Thus, the following regex S/R :
SEARCH
(?s-i)(<(.+?)[> ].*?(?:/>|</\2>))(*SKIP)(*F)|(?<=\w)-(?=\x20)REPLACE
\x20-Will replace, in these three lines ONLY, any string
letter-, followed with aspacechar, with the stringletter -and aspacecharBest Regards,
guy038
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login