Regex: Select all non-ASCII characters html tags
-
hello. I have this 4 lines. I want to make a regex as to find only those html tags that contain non-ASCII characters. (ignoring the
<em></em>tags)Line 1 <p class="OANA"><em>评论家Drive有一些重要的东西来说,Home Edition关于总是分享胜利的人才,转向他们的起源:</em></p> Line 2 <p class="OANA"><em>评论家Drive有一些重要的东西来说,<em>Home Edition关于总是分享胜利的</em>人才,转向他们的起源:</em></p> Line 3 <p class="OANA"><em>评论家有一些重要的东西来说,关于总是分享胜利的人才,转向他们的起源:</em></p> Line 4 <p class="OANA"><em>What is it called when you love your car?</em></p>The output should be Line 1, Line 2 and Line 4 (so don’t have to match Line 3)
My regex is not good, find’s all of them.
Find:
<p class="OANA">+(?!\w+<em>).*(\w+[\x00-\x7F]).*</p> -
also, will be a nice idea to use another regex as to find only the Line 4.
-
Hello, @robin-cruise,
I don’t really understand what you want !
First, to match any Chinese character, you must use the range described by the
[\x{4E00}-\x{9FFF}character class. Refer hereHowever, your text, between the
<em>and</em>tags, also contains some fullwith punctuation characters as,and:Refer here
So, If I mark all characters with the regex
[\x{4E00}-\x{9FFF}\x{FF00}-\x{FFEF}], in your sample, it matches102characters and, between the outer<em>and</em>tags :-
The Line
1contains twoASCIIstrings ( Drive and Home Edition ) and three ranges of non-ASCIIcharacters -
The Line
2contains twoASCIIstrings ( Drive and Home Edition ) and four ranges of non-ASCIIcharacters -
The Line
3contains one range of non-ASCIIcharacters, only, without anyASCIIchar -
The Line
4contains one range ofASCIIcharacters, only ( What is it called when you love your car? )
Now, the question is : what you want to do ?
On the other hand, your last question is :
also, will be a nice idea to use another regex as to find only the Line 4.
I suppose that the following regex
(?<=<p class="OANA"><em>)[\x00-\x7F]+?(?=</em>)should work and matches the shortest range ofASCIIcharacters, after the string<p class="OANA"><em>till the string</em>not included !Best Regards,
guy038
-
-
@guy038 said in Regex: Select all non-ASCII characters html tags:
(?<=<p class=“OANA”><em>)[\x00-\x7F]+?(?=</em>)
thanks a lot @guy038
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login