• Login
Community
  • Login

separate only duplicate numbers from file

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
6 Posts 3 Posters 3.9k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G
    gautam patel
    last edited by Aug 16, 2016, 5:14 PM

    Hi Guys,

    kindly suggest me way to separate only duplicate data from below format… all kind of help appreciate… Thanks in advance… :)

    input is,

    919913209647 02:38:47
    919979418778 02:57:03
    918980055979 02:46:12
    919428616318 02:46:32
    919512672560 02:46:33
    919512646084 02:46:52
    919512497164 02:48:13
    919512497164 02:48:13
    919913029225 02:50:23
    917567814941 03:02:35
    919537722335 03:18:41
    918980299814 03:24:49
    919727009323 03:29:44

    Output

    919512497164 02:48:13
    919512497164 02:48:13

    1 Reply Last reply Reply Quote 0
    • S
      Scott Sumner
      last edited by Aug 16, 2016, 5:40 PM

      I would be inclined to use the “Mark” feature, possibly with the “Bookmark line” option enabled depending upon your real purpose here.

      Give it a try:
      Find dialog box
      select Mark tab
      checkmark in Bookmark line
      checkmark in Wraparound (you may want this)
      select “Regular expression”
      everything else unchecked
      Find what: (?s)^(.*?)$\s+?^(?=.*^\1$)
      Press “Mark All”

      This will not highlight/bookmark all the duplicate lines, but should highlight the first one of the set.

      Without knowing where you are going next with your data, it is tough to be more specific and/or suggest a better approach.

      G 1 Reply Last reply Aug 16, 2016, 5:47 PM Reply Quote 0
      • G
        gautam patel @Scott Sumner
        last edited by Aug 16, 2016, 5:47 PM

        @Scott-Sumner No Sir, its is not working…

        1 Reply Last reply Reply Quote 0
        • S
          Scott Sumner
          last edited by Aug 16, 2016, 5:50 PM

          It works for me when I copy your sample data from here into a file and then step for step do what I said to, including copying and pasting the “Find what:” data…it highlights and bookmarks line 7 of your sample data.

          1 Reply Last reply Reply Quote 0
          • G
            gautam patel
            last edited by Aug 16, 2016, 6:16 PM

            ^(.+?)\R(\1\R?)+

            i found way to mark all duplicate number can you suggest way for how to separate only marked data ? file is having more then 90000 lines…

            1 Reply Last reply Reply Quote 0
            • J
              Jim Dailey
              last edited by Aug 16, 2016, 9:01 PM

              Here’s an AWK script that can do the trick for you:

              # If there is something other than whitespace on a line:
              NF {
                  # Use the text as an array index and count how many times it appears
                  Line[$0]++
              }
              
              # Once the whole file is done, spit out every line that was duplicated 2 or more
              # times, the number of times they were duplcated.
              #
              # If Line[line] == 1, then the line appeared only 1 time (it is unique).
              # If Line[line] > 1, then the line appeared that many times.
              END {
                  for (line in Line) {
                      for (i = 1; Line[line] > 1 && i <= Line[line]; i++) {
                          print line
                      }
                  }
              }
              

              I use GNU AWK for windows (gawk.exe). If you save the script as dup.awk, then:

              gawk -f .\dup.awk <name of your 90000 line file>  > dupout.txt
              

              will create dupout.txt with all the duplicated lines. I used the data in your original post and let the output go to standard out:

              C:\temp\awk>type input.txt
              919913209647 02:38:47
              919979418778 02:57:03
              918980055979 02:46:12
              919428616318 02:46:32
              919512672560 02:46:33
              919512646084 02:46:52
              919512497164 02:48:13
              919512497164 02:48:13
              919913029225 02:50:23
              917567814941 03:02:35
              919537722335 03:18:41
              918980299814 03:24:49
              919727009323 03:29:44
              C:\temp\awk>gawk -f .\dup.awk input.txt
              919512497164 02:48:13
              919512497164 02:48:13
              
              C:\temp\awk>
              
              1 Reply Last reply Reply Quote 0
              6 out of 6
              • First post
                6/6
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors