Community
    • Login

    How to remove duplicate entries in each line

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    6 Posts 2 Posters 314 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • zcraberZ
      zcraber
      last edited by zcraber

      I’ve an HTML file with duplicate href attributes like this:

      <a href="https://example.com/" href="https://example.com/" target="_blank">Example</a>
      <a href="https://website.com/" href="https://website.com/" target="_blank">Website</a>
      <a href="https://sample.com/" href="https://sample.com/" target="_blank">Sample</a>
      

      How can I remove the duplicate href attribute in each line using Notepad++?

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @zcraber
        last edited by Alan Kilborn

        @zcraber

        Maybe try:

        find: (?-s)(href="https://.+?\.com/" )(?=\1)
        repl: nothing
        mode: Regular expression

        zcraberZ 1 Reply Last reply Reply Quote 0
        • zcraberZ
          zcraber @Alan Kilborn
          last edited by zcraber

          Hi @alan-kilborn, thanks a lot for the reply. Appreciate your time.
          I forgot to mention that there are various domains like .org, .co, .gov etc in the file.

          Is there a regex that handles all of these?

          Alan KilbornA 2 Replies Last reply Reply Quote 0
          • Alan KilbornA
            Alan Kilborn @zcraber
            last edited by

            @zcraber said in How to remove duplicate entries in each line:

            I forgot to mention that

            Changes spec after solution is provided. :-(

            1 Reply Last reply Reply Quote 1
            • Alan KilbornA
              Alan Kilborn @zcraber
              last edited by Alan Kilborn

              @zcraber said in How to remove duplicate entries in each line:

              there are various domains like .org, .co, .gov etc in the file

              (?-s)(href="https://.+?\.(?:com|org|gov)/" )(?=\1)

              zcraberZ 1 Reply Last reply Reply Quote 3
              • zcraberZ
                zcraber @Alan Kilborn
                last edited by

                @alan-kilborn Thank you.

                Changes spec after solution is provided. :-(

                Sorry about that. Next time I’ll be more specific. :)

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors