Regex: Parsing / Extract the content of a html tag and save it another file?



  • hello, I have several html file. Is it possible to extract the content of the <title></title> tag on each file, and save them on another file? For example:

    file-1.html, file-2.html, file-3.html, …file 800.html

    each of them has the same tag, but with different content:

    file-1.html
    <title>My name is Prince</title>

    file-2.html
    <title>I love cars</title>

    file-800.html
    <title>My book is here</title>

    So, I need to extract the content of these tags, and save them into another file, for example save.txt

    In save.txt I will have:

    My name is Prince
    I love cars
    ...
    My book is here
    

    The regex to select the content of all title tags is this: (?s)<title>(.*?)<\/title> What should I do next as to save all the results automatically?



  • @Vasile-Caraus said in Regex: Parsing / Extract the content of a html tag and save it another file?:

    What should I do next

    I would run a Find in Files search, then Ctrl+a then Ctrl+c the output in the Search results window, then paste that into a new N++ tab and start processing that output with more regular expression replacements…



  • @Vasile-Caraus said in Regex: Parsing / Extract the content of a html tag and save it another file?:

    (?s)<title>(.*?)</title>

    ok, so I run this regex: (?s)(<title>)(.*?)(<\/title>) in all files. I copy the results in save.txt file, and I got something like this

    <title>My name is Prince</title>
    <title>I love cars</title>
    <title>My book is here</title>
    

    Now, I must extract the content from tags, and I use the same regex, with replace:

    Find: (?s)(<title>)(.*?)(<\/title>)
    Replace by: \2

    The output

    My name is Prince
    I love cars
    My book is here
    

    thanks. I thought it could be done in one move. :)


Log in to reply