Community
    • Login

    separate only duplicate numbers from file

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    6 Posts 3 Posters 3.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • gautam patelG
      gautam patel
      last edited by

      Hi Guys,

      kindly suggest me way to separate only duplicate data from below format… all kind of help appreciate… Thanks in advance… :)

      input is,

      919913209647 02:38:47
      919979418778 02:57:03
      918980055979 02:46:12
      919428616318 02:46:32
      919512672560 02:46:33
      919512646084 02:46:52
      919512497164 02:48:13
      919512497164 02:48:13
      919913029225 02:50:23
      917567814941 03:02:35
      919537722335 03:18:41
      918980299814 03:24:49
      919727009323 03:29:44

      Output

      919512497164 02:48:13
      919512497164 02:48:13

      1 Reply Last reply Reply Quote 0
      • Scott SumnerS
        Scott Sumner
        last edited by

        I would be inclined to use the “Mark” feature, possibly with the “Bookmark line” option enabled depending upon your real purpose here.

        Give it a try:
        Find dialog box
        select Mark tab
        checkmark in Bookmark line
        checkmark in Wraparound (you may want this)
        select “Regular expression”
        everything else unchecked
        Find what: (?s)^(.*?)$\s+?^(?=.*^\1$)
        Press “Mark All”

        This will not highlight/bookmark all the duplicate lines, but should highlight the first one of the set.

        Without knowing where you are going next with your data, it is tough to be more specific and/or suggest a better approach.

        gautam patelG 1 Reply Last reply Reply Quote 0
        • gautam patelG
          gautam patel @Scott Sumner
          last edited by

          @Scott-Sumner No Sir, its is not working…

          1 Reply Last reply Reply Quote 0
          • Scott SumnerS
            Scott Sumner
            last edited by

            It works for me when I copy your sample data from here into a file and then step for step do what I said to, including copying and pasting the “Find what:” data…it highlights and bookmarks line 7 of your sample data.

            1 Reply Last reply Reply Quote 0
            • gautam patelG
              gautam patel
              last edited by

              ^(.+?)\R(\1\R?)+

              i found way to mark all duplicate number can you suggest way for how to separate only marked data ? file is having more then 90000 lines…

              1 Reply Last reply Reply Quote 0
              • Jim DaileyJ
                Jim Dailey
                last edited by

                Here’s an AWK script that can do the trick for you:

                # If there is something other than whitespace on a line:
                NF {
                    # Use the text as an array index and count how many times it appears
                    Line[$0]++
                }
                
                # Once the whole file is done, spit out every line that was duplicated 2 or more
                # times, the number of times they were duplcated.
                #
                # If Line[line] == 1, then the line appeared only 1 time (it is unique).
                # If Line[line] > 1, then the line appeared that many times.
                END {
                    for (line in Line) {
                        for (i = 1; Line[line] > 1 && i <= Line[line]; i++) {
                            print line
                        }
                    }
                }
                

                I use GNU AWK for windows (gawk.exe). If you save the script as dup.awk, then:

                gawk -f .\dup.awk <name of your 90000 line file>  > dupout.txt
                

                will create dupout.txt with all the duplicated lines. I used the data in your original post and let the output go to standard out:

                C:\temp\awk>type input.txt
                919913209647 02:38:47
                919979418778 02:57:03
                918980055979 02:46:12
                919428616318 02:46:32
                919512672560 02:46:33
                919512646084 02:46:52
                919512497164 02:48:13
                919512497164 02:48:13
                919913029225 02:50:23
                917567814941 03:02:35
                919537722335 03:18:41
                918980299814 03:24:49
                919727009323 03:29:44
                C:\temp\awk>gawk -f .\dup.awk input.txt
                919512497164 02:48:13
                919512497164 02:48:13
                
                C:\temp\awk>
                
                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors