Community
    • Login

    Regex: Parsing / Extract the content of a html tag and save it another file?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    3 Posts 2 Posters 3.1k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Vasile CarausV Offline
      Vasile Caraus
      last edited by

      hello, I have several html file. Is it possible to extract the content of the <title></title> tag on each file, and save them on another file? For example:

      file-1.html, file-2.html, file-3.html, …file 800.html

      each of them has the same tag, but with different content:

      file-1.html
      <title>My name is Prince</title>

      file-2.html
      <title>I love cars</title>
      …
      file-800.html
      <title>My book is here</title>

      So, I need to extract the content of these tags, and save them into another file, for example save.txt

      In save.txt I will have:

      My name is Prince
      I love cars
      ...
      My book is here
      

      The regex to select the content of all title tags is this: (?s)<title>(.*?)<\/title> What should I do next as to save all the results automatically?

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA Offline
        Alan Kilborn @Vasile Caraus
        last edited by

        @Vasile-Caraus said in Regex: Parsing / Extract the content of a html tag and save it another file?:

        What should I do next

        I would run a Find in Files search, then Ctrl+a then Ctrl+c the output in the Search results window, then paste that into a new N++ tab and start processing that output with more regular expression replacements…

        1 Reply Last reply Reply Quote 1
        • Vasile CarausV Offline
          Vasile Caraus
          last edited by Vasile Caraus

          @Vasile-Caraus said in Regex: Parsing / Extract the content of a html tag and save it another file?:

          (?s)<title>(.*?)</title>

          ok, so I run this regex: (?s)(<title>)(.*?)(<\/title>) in all files. I copy the results in save.txt file, and I got something like this

          <title>My name is Prince</title>
          <title>I love cars</title>
          <title>My book is here</title>
          

          Now, I must extract the content from tags, and I use the same regex, with replace:

          Find: (?s)(<title>)(.*?)(<\/title>)
          Replace by: \2

          The output

          My name is Prince
          I love cars
          My book is here
          

          thanks. I thought it could be done in one move. :)

          1 Reply Last reply Reply Quote 2

          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

          With your input, this post could be even better 💗

          Register Login
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors