Community
    • Login

    Match tags whose contents are repeated in multiple files

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    regex
    7 Posts 4 Posters 3.2k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Robin CruiseR Offline
      Robin Cruise
      last edited by

      Good Day. I want to use regular expressions to match tags whose contents are repeated in multiple files. Maybe anyone help me. For example:

      <title>THIS IS THE ONE</title>

      <title>44 5464 blah blah</title>

      <title>bebe is more then a letter</title>

      <title>THIS IS THE ONE</title>

      <title>destroy the enigma64 Joker</title>

      My desire output, the result of the search in multiple files should be:

      <title>THIS IS THE ONE</title>

      <title>THIS IS THE ONE</title>

      1 Reply Last reply Reply Quote 0
      • Vasile CarausV Offline
        Vasile Caraus
        last edited by Vasile Caraus

        try something like this:

        (?s)<title>([^<]*)</title>.*?<title>[^>]*>(?!\1)[^<]*</title>

        1 Reply Last reply Reply Quote 0
        • Robin CruiseR Offline
          Robin Cruise
          last edited by

          it’s not working

          1 Reply Last reply Reply Quote 0
          • gstaviG Offline
            gstavi
            last edited by gstavi

            Do you have background in Computer Sciences? Do you know anything about algorithms?
            Finding identical elements in a large set is a difficult problem.
            Most reasonable solutions require sorting of the set so identical elements become sequential.
            Regular expressions by themselves won’t do the trick. They are only the first step of extracting tags. To find the duplicates you will need something like that.

            I don’t know awk but extrapolating from this I think that after you extract all titles into titles.txt the following may work:
            awk 'seen[$0]++ == 2' titles.txt

            1 Reply Last reply Reply Quote 0
            • gstaviG Offline
              gstavi
              last edited by

              Correcting myself: I didn’t look close enough at the awk solution. I thought it prints a single copy of EVERY line but it actually already prints the 2nd instance of duplicated lines so It will work just as is.
              awk 'seen[$0]++ == 1' titles.txt

              1 Reply Last reply Reply Quote 0
              • Vasile CarausV Offline
                Vasile Caraus
                last edited by Vasile Caraus

                the question is how to use awk in windows?

                1 Reply Last reply Reply Quote 0
                • guy038G Offline
                  guy038
                  last edited by guy038

                  Hello, @vasile-caraus,

                  You can download some GNU tools for Win32 from the link, below :

                  https://code.google.com/p/gnu-on-windows/downloads/list

                  ( The downloaded GAWK version is v4.1.0 )

                  The GAWK documentation may be downloaded, from the link :

                  http://www.gnu.org/software/gawk/manual/

                  @vasile-caraus, GAWK software is a very very powerful Unix tool, but you’ll need some time, even to learn basic functions. For instance, the PDF Reference manual is a 540 pages file ! But, I’m sure it won’t take you much time to “Google search” an short introduction to the GAWK tool !!

                  Cheers,

                  guy038

                  1 Reply Last reply Reply Quote 1

                  Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                  Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                  With your input, this post could be even better 💗

                  Register Login
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors