Regex: Find out what tags does not contain english characters (ansii/utf-8)



  • hello. I have this kind of html tags:

    <title>I love cars | My name (am) </title>

    <title>የማይታይ እሳት በታላቅነት ከተያዘ ነፍስ | My name (am) </title>

    Find out what tags does not contain english characters (ansii)

    The output (after the FIND, should be the line 2)

    <title>የማይታይ እሳት በታላቅነት ከተያዘ ነፍስ | My name (am) </title>

    how can I do this, please?



  • This post is deleted!


  • This post is deleted!


  • Hello, @robin-cruise and All,

    The regex, below, searches for any range <title> •••••• </title> if, at least, one character, between <title> and </title>, has a code-point over x{007F}

    SEARCH (?s-i)<title>(?=.*?[^\x00-\x7f].*?</title>).+?</title>

    The positive look-ahead (?=.*?[^\x00-\x7f].*?</title>), after the opening <title> tag, looks for a char, over \x7F, located, further on, and before an ending </title> tag

    Best Regards

    guy038



  • super answer @guy038 thanks.

    I made a short version of yours:

    <title>\K(?![^\x00-\x7F]+).*?\| - finds the first line

    <title>\K([^\x00-\x7F]+).*?\| - finds the second line


Log in to reply