Here’s an AWK script that can do the trick for you:
# If there is something other than whitespace on a line:
NF {
# Use the text as an array index and count how many times it appears
Line[$0]++
}
# Once the whole file is done, spit out every line that was duplicated 2 or more
# times, the number of times they were duplcated.
#
# If Line[line] == 1, then the line appeared only 1 time (it is unique).
# If Line[line] > 1, then the line appeared that many times.
END {
for (line in Line) {
for (i = 1; Line[line] > 1 && i <= Line[line]; i++) {
print line
}
}
}
I use GNU AWK for windows (gawk.exe). If you save the script as dup.awk, then:
gawk -f .\dup.awk <name of your 90000 line file> > dupout.txt
will create dupout.txt with all the duplicated lines. I used the data in your original post and let the output go to standard out:
C:\temp\awk>type input.txt
919913209647 02:38:47
919979418778 02:57:03
918980055979 02:46:12
919428616318 02:46:32
919512672560 02:46:33
919512646084 02:46:52
919512497164 02:48:13
919512497164 02:48:13
919913029225 02:50:23
917567814941 03:02:35
919537722335 03:18:41
918980299814 03:24:49
919727009323 03:29:44
C:\temp\awk>gawk -f .\dup.awk input.txt
919512497164 02:48:13
919512497164 02:48:13
C:\temp\awk>