Regex for Searching <HEAD> Section
-
I want to use Notepad++ to find soft hyphen characters (ISO 8859: 0xAD, Unicode U+00AD SOFT HYPHEN, HTML: ) in the <head> section of my HTML files. I tried the two regular expressions below, but both return zero hits.
<head>.*?.*?</head> <head>.*.*</head>
Curiously, the following regex does finds soft hyphens in <figcaption> sections:
<figcaption>.*?.*?</figcaption>
I suspect the issue is that the <head> section contains newlines. I tried the search with the “. matches newline” both checked and unchecked. Still got zero hits both ways.
Is there a way to do this kind of search in Notepad++?
-
I think the code blocks you used above are hiding your soft-hyphen character, at least visually. I find that if I copy and paste them into Notepad++, the soft-hyphen character reappears.
Anyway, I would try searching for:
(?s)<head>.*?\x{00AD}.*?</head>
I think there have been some recent postings about Unicode characters used explicitly in the Find-what box of the Find dialog not working correctly…?
-
Hello, @aksarben, @alan-kilborn and All,
Simply, use this regex S/R :
SEARCH
(?s)(.*?<head>|\G)((?!</head>).)*?\K\xAD
REPLACE
Any SINGLE character or STRING
Notes :
-
I assume, of course, that there only one section
<head>........</head>
per file -
The
<head>........</head>
section can be, either, in one line or splitted into several ones -
Any soft hyphen, found, above the starting tag
<head>
is ignored -
Any soft hyphen, between the starting and the ending tag is found, individually
-
Any soft hyphen, found, under the ending tag
</head>
is ignored -
Preferably, when testing on a single file, tick the
wrap around
option, which forces to starts the S/R process from the very beginning of the file
Best Regards,
guy038
-