• Login
Community
  • Login

Regex: Parsing / Extract the content of a html tag and save it another file?

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
3 Posts 2 Posters 2.2k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • V
    Vasile Caraus
    last edited by May 24, 2021, 1:03 PM

    hello, I have several html file. Is it possible to extract the content of the <title></title> tag on each file, and save them on another file? For example:

    file-1.html, file-2.html, file-3.html, …file 800.html

    each of them has the same tag, but with different content:

    file-1.html
    <title>My name is Prince</title>

    file-2.html
    <title>I love cars</title>
    …
    file-800.html
    <title>My book is here</title>

    So, I need to extract the content of these tags, and save them into another file, for example save.txt

    In save.txt I will have:

    My name is Prince
    I love cars
    ...
    My book is here
    

    The regex to select the content of all title tags is this: (?s)<title>(.*?)<\/title> What should I do next as to save all the results automatically?

    A 1 Reply Last reply May 24, 2021, 1:15 PM Reply Quote 0
    • A
      Alan Kilborn @Vasile Caraus
      last edited by May 24, 2021, 1:15 PM

      @Vasile-Caraus said in Regex: Parsing / Extract the content of a html tag and save it another file?:

      What should I do next

      I would run a Find in Files search, then Ctrl+a then Ctrl+c the output in the Search results window, then paste that into a new N++ tab and start processing that output with more regular expression replacements…

      1 Reply Last reply Reply Quote 1
      • V
        Vasile Caraus
        last edited by Vasile Caraus May 24, 2021, 1:25 PM May 24, 2021, 1:24 PM

        @Vasile-Caraus said in Regex: Parsing / Extract the content of a html tag and save it another file?:

        (?s)<title>(.*?)</title>

        ok, so I run this regex: (?s)(<title>)(.*?)(<\/title>) in all files. I copy the results in save.txt file, and I got something like this

        <title>My name is Prince</title>
        <title>I love cars</title>
        <title>My book is here</title>
        

        Now, I must extract the content from tags, and I use the same regex, with replace:

        Find: (?s)(<title>)(.*?)(<\/title>)
        Replace by: \2

        The output

        My name is Prince
        I love cars
        My book is here
        

        thanks. I thought it could be done in one move. :)

        1 Reply Last reply Reply Quote 2
        1 out of 3
        • First post
          1/3
          Last post
        The Community of users of the Notepad++ text editor.
        Powered by NodeBB | Contributors