How to replace only one period or dot and skip the others ?
-
Block of text for testing:-
<html lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <META name="viewport" content="width=device-width, initial-scale=1"> <title>Homeopathic medicine, Homeopathic remedies, CARCINOSIN, Carc</title> <META name="description" content="CARCINOSIN Bangalore, Carc"> <META name="keywords" content="Homeopathic medicine Bangalore, Homeopathic remedies Bangalore, CARCINOSIN Bangalore, Carc"> <META name="robots" content="index, follow" /> <link rel="canonical" href="https://cure.com/CARCINOSIN.html"> <META name="google-site-verification" content="B5jrpKjfHEj--_J-rT51c3CG8zg1sY_ZRQAbqQ1oN5Q"> <link href="css/style.css" rel="stylesheet" type="text/css" media="all"> <link href="https://www.cure4incurables.in/css/bootstrap.min.css" rel="stylesheet"> <link href="https://www.cure4incurables.in/css/style1.css" rel="stylesheet"> <link href="css/style.css" rel="stylesheet" type="text/css" media="all"> <link rel="stylesheet" type="text/css" href="engine1/style.css" media="screen"> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css"> </head> <script src="js/bootstrap.min.js"></script> <script src="js/backtotop.js"></script>cure.com <p>cure. com</p>
I want to replace only the
.
in thecure.com
and skip all the other periods (dots)
I tried this Regular expression to no avail:-(<link[\S\s]*?<\/head>)(*SKIP)(*F)|(<[\S\s]*?>)(*SKIP)(*F)|(\.\s*\w)(*SKIP)(*F)|(\.)
—
moderator added code markdown around text; please don’t forget to use the
</>
button to mark example text as “code” so that characters don’t get changed by the forum -
@dr-ramaanand
It’s a bit confusing as you say you want to ONLY change 1 DOT character. Yet in your small example block there are 5 instances ofcure.
. So is it only one of those, all 5 of those. Also the lastcure.
instance has a following space:
<p>cure. com</p>
space is just after the DOT character.Terry
EDIT. Sorry it was actually only 3 instances in the example block. I had find counting in regex mode, not literal. So DOT character was any character.
-
Remember to format your example data. It’s not like you don’t know how, because you remembered to format the regex in the same post.
But HTML and the URLs used in your example data don’t come through the way you expect, and all your quote marks get changed by the forum into smart quotes when you choose to ignore formatting.
If you were new to the forum, it would just warrant a reminder. But for someone who has been using the forum as their personal regex writing service for two years, you should at least go to the effort of formatting your post (and looking at the PREVIEW in order to verify it’s been formatted), even if you’ve ignored all warnings that this forum is not a regex writing service.
By you “forgetting” to format, you are just making it harder for members of the Community to help you. I’d think that you’d want to do everything in your power to make it easier for Terry and Guy to help you, considering how much they bend over backwards to give answers to your years of regex questions. But instead, you repay their kindness and grace with a lack of effort.
----
Useful References
- Please Read Before Posting
- Template for Search/Replace Questions
- Formatting Forum Posts
- Notepad++ Online User Manual: Searching/Regex
- FAQ: Where to find other regular expressions (regex) documentation
----
Please note: This Community Forum is not a data transformation service; you should not expect to be able to always say “I have data like X and want it to look like Y” and have us do all the work for you. If you are new to the Forum, and new to regular expressions, we will often give help on the first one or two data-transformation questions, especially if they are well-asked and you show a willingness to learn; and we will point you to the documentation where you can learn how to do the data transformations for yourself in the future. But if you repeatedly ask us to do your work for you, you will find that the patience of usually-helpful Community members wears thin. The best way to learn regular expressions is by experimenting with them yourself, and getting a feel for how they work; having us spoon-feed you the answers without you putting in the effort doesn’t help you in the long term and is uninteresting and annoying for us.
-
@Terry-R I want to skip the dot in the
cure. com
as well as all the other dots, including those between the<
and>
but find/match the dot in thecure.com
-
Still not sure of which
cure.com
you are looking for but…Yes, as @PeterJones said you do seem to lean a lot on the regulars here to cook up a regex for you. Whilst you have provided some regexes you have tried with your various questions it still seems you stop trying too quickly and just want help. That would suggest minimal effort on your behalf so that you can feel good about asking for help.
Since your request seemed to be looking for text in a specific area of the file, did you not think to look at the FAQ post here and try to input the data to suit your example. I did and it quickly found the 1 entry I think you were looking for. You were directed to this exact same FAQ post 2 years ago by @PeterJones.
(?-si:</[^>]+>|(?!\A)\G)(?s-i:(?!<).)*?\K(?-si:cure.[a-z])
Terry
-
@Terry-R Thanks a lot. Can you also provide me a link to read about the
(*SKIP) (*F)
method when 2 or more strings need to be skipped? I could not find anything online. -
Well then you haven’t looked. Again it seems that you stop looking/trying too early. @guy038 brought these backtracking controls to this forum, use this forum’s search function. However I’d suggest looking at other documentation and for that start with yet another of the FAQ posts here.
Another is rexegg.com.Terry
-
Hello, @dr-ramaanand, @terry-r, @peterjones and All,
@dr-ramaanand, you said :
I want to replace only the . in the cure.com and skip all the other periods (dots)
OK, I understand, but, by which character or string do you want to replace this literal dot ? Do you need to delete the
.com
string ? I’m just curious ?
Here is a variant of the @terry-r solution :
SEARCH
(?-i:</[a-z]+>|(?!\A)\G)(?s:(?!<).)*?cure\K\.(?=[a-z]+)
Anyway, for most of your questions, the generic regex, exposed in this post below, seems to be your best friend !
For a nice explanation of the
(*SKIP)(*F)
syntax, follow the link below :https://www.rexegg.com/backtracking-control-verbs.php#skipfail
A generic regex, corresponding to its behavior, could be :
What_I_don’t want
(*SKIP)(*FAIL)|
What_I_want or What_I_don’t want(*SKIP)(*F)|
What_I_want
I finally was able to find a regex, using these two backtracking control verbs, which seems adapted to your present problem :
SEARCH
<[^<>]+>(*SKIP)(*F)|(?-i)\.(?=[a-z]+)
So, as you said :
-
Anything between a
<
and a>
character would be ignored -
Because of the look-head, which expects some lower-case letters right after the literal
.
, it also ignored the casecure. com
string !
Now, you should be interested by the general regex, below, which matches, in an
HTML
document, any text, not ONLY composed of blanks chars, even spread over several lines, which lie within any>............<
range, whatever it is !SEARCH / MARK
>\s+<(*SKIP)(*F)|(?<=>)[^<>]+(?=<)
Notes :
-
Contrary to the previous regex, with that new regex, we’re searching for text between a
>
and the nearest<
character -
If the zone of chars, within the
>.......<
range, contains ONLYspace
characters, this zone is simply ignored, because of the(*SKIP)(*F)
syntax
Test that regex against your initial text, pasted in a new tab ( I slightly changed it, in order to add some line-breaks ! )
<html lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <META name="viewport" content="width=device-width, initial-scale=1"> <title>Homeopathic medicine, Homeopathic remedies, CARCINOSIN, Carc</title> <META name="description" content="CARCINOSIN Bangalore, Carc"> <META name="keywords" content="Homeopathic medicine Bangalore, Homeopathic remedies Bangalore, CARCINOSIN Bangalore, Carc"> <META name="robots" content="index, follow" /> <link rel="canonical" href="https://cure.com/CARCINOSIN.html"> <META name="google-site-verification" content="B5jrpKjfHEj--_J-rT51c3CG8zg1sY_ZRQAbqQ1oN5Q"> <link href="css/style.css" rel="stylesheet" type="text/css" media="all"> <link href="https://www.cure4incurables.in/css/bootstrap.min.css" rel="stylesheet"> <link href="https://www.cure4incurables.in/css/style1.css" rel="stylesheet"> <link href="css/style.css" rel="stylesheet" type="text/css" media="all"> <link rel="stylesheet" type="text/css" href="engine1/style.css" media="screen"> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css"> </head> <script src="js/bootstrap.min.js"></script> <script src="js/backtotop.js"></script>cure.com <p>cure. c om</p>
Interesting, isn’t it ! You may try this regex against some of your
HTML
documents, too !Of course, the more general regex, below, matches absolutely any non-null range of chars, even multi-lines ones, located between a
>
and the nearest<
character of anHTML
documentSEARCH
(?<=>)[^<>]+(?=<)
Best Regards,
guy038
-
-
@guy038 I was trying to skip finding/matching the dots and periods (full stops) that I understood were necessary and find the rest.
I finally found just two or three unnecessary ones out of a folder containing 300+files using this Regular expression:-(<link[\S\s]*?<\/h1>)(*SKIP)(*F)|(<div style="margin-bottom:-15px;width: 100%;background-color:#EBF4FB;">[\S\s]*?<div class="left">)(*SKIP)(*F)|(<!--[\S\s}*?\-->)(*SKIP)(*F)|(<[\S\s]*?>)(*SKIP)(*F)|(Note:[\S\s]*?<\/span>)(*SKIP)(*F)|(\.\s*\w)(*SKIP)(*F)|(<ul[^<>]*+>[\S\s]*?</ul>)(*SKIP)(*F)|(<style[^<>]*+>[\S\s]*?</style>)(*SKIP)(*F)|(for\s*any\s*questions\/\s*treatment\s*\.)(*SKIP)(*F)|(Efficacy\s*studies\s*\.)(*SKIP)(*F)|(All\s*rights\s*reserved\s*\.)(*SKIP)(*F)|(a\.m\.)(*SKIP)(*F)|(p\.m\.)(*SKIP)(*F)|(\.\s*\[)(*SKIP)(*F)|(\.\s*\()(*SKIP)(*F)|(\]\.)(*SKIP)(*F)|(etc\.)(*SKIP)(*F)|(C\.V\.S\.)(*SKIP)(*F)|(C\.V\.A\.)(*SKIP)(*F)|(C\.N\.S\.)(*SKIP)(*F)|(G\.I\.T\.)(*SKIP)(*F)|(\.\])(*SKIP)(*F)|(\.\))(*SKIP)(*F)|(B\.P\.)(*SKIP)(*F)|(\.\s*<)(*SKIP)(*F)|(\.\s*>)(*SKIP)(*F)|(\.\s*')(*SKIP)(*F)|(\.\s*"\w)(*SKIP)(*F)|(\.,\s*\w)(*SKIP)(*F)|(\.\s*")(*SKIP)(*F)|(M\.D\.)(*SKIP)(*F)|(R\.S\.)(*SKIP)(*F)|(identity\.)(*SKIP)(*F)|\.
Thanks a lot guys! Please keep helping people who ask questions here, I don’t know who else will. Thanks again!