How I can count how many repeated numbers exist in a text, only counting, not replace or remove.

Rar Crash

I’m having a hard time trying to find a way to ONLY count how many duplicates exists in a bunch of number, can someone help me? Please!
ex:
05 14 13 18 19 20
25 05 20 17 16 03
Total repeated:2
because 05 and 20 appears twice.

So that what I want to do, simple, I dont want replace or remove ONLY count how many repeated numbers exists.
Can someone help me? Anyway that I can do this without headaches.
Thanks.

guy038

Hello Rar Crash,

Could you have numbers, present three times or more, in your bunch of numbers ? If NOT ( so your numbers occur once or twice, only ), I’ve got an nice solution that uses, again, a regex expression :

Open a new file, or use the CTRL + N shortcut
Paste your bunch of numbers, ONLY, in this new file ( it doesn’t matter how this list of numbers is displayed )
Open the Find dialog, or use the CTRL + F shortcut
Type, in the Find what zone, the following regex (?s)\b(\d+)\b(?=.+\b\1\b)
Click on the Count button, or hit the ALT + T shortcut

Et voilà !!

Remarks :

Remember that the Count function run through all the contents of the current file. So, be aware about additional text, which would NOT be part of your bunch of numbers !
The location of the cursor has no importance on the result number of matches
The 3 options Wrap around, Match case and . matches newline can be set or unset, without any change, too, on the result

For instance, the following bunch of numbers, below :

10 14 13 8 1139 20
25 05 20 17 16 03
07 16 8 1139 05 10 07

which could be written, as the unsorted list, below :

or, even, as the unsorted single line, below :

10 14 13 10 16 20 25 05 20 17 03 8 1139 07 16 8 1139 05 07

would, all, produce the result : Count: 7 matches, meaning that seven numbers are present two times, in each group !

Best Regards,

guy038

P.S. :

It’s a bit late, by now, in France ( 04h05 ). As I said in my precedent post, I’ll explain this regex, tomorrow, after a deserved rest !

guy038

Hi all,

I’m back to give you some explanations on the regex, in my previous post

The main ideas, about that regex, are :

Match the first occurrence of a duplicated number
Verify that, further on, in the current file, a second occurrence of that exact number does exist
Keep the position of the regex engine scan, just after the last character of the first occurrence of the matched number, for later searches.

I started with the simple regex (\d+).+\1. Unfortunately, this regex can detect duplicate numbers on a same line, only, because the dot character does NOT match the End of Line characters ( as \r and \n )

So, as the location of the two numbers may occur on different lines, I improved the regex to the form (?s)(\d+).+\1, by adding, the modifier (?s), which means that the dot character matches everything. For instance, the regex (?s).+ would match all the contents of a file, as they were, all, in a single line !

Now let’s imagine we apply our regex (?s)(\d+).+\1, against the bunch of numbers, below :

27 13 00 46 55
88 99 13 99 46

It, first, matches from the number 13, in the first line till the second number 13 in the second line. A second click on the Find next button would not find anything else, although the numbers 46 and 99 are, obviously, duplicate numbers. Unfortunately, when the regex engine reaches the second number 13, the first numbers 46 and 99 are over !

So, we MUST keep the position of the regex engine just, after the first number 13, in order to get other matches, further on. And easy way to do it is to use a look-ahead. Thus, our regex becomes (?s)(\d+)(?=.+\1). Remember that the condition, in the look-ahead, must be verified to match the entire regex. However, the match of the look-around is NEVER part of the regex. So, this time, the regex matches, successively, the 3 numbers 13, 46 and 99, only :-)

Nice ! But there’s still a problem. Let’s run our regex (?s)(\d+)(?=.+\1) against this other bunch of numbers, below :

89 23 4567 00 1233 56

You’ll first match the number 23, because it also occurs inside the number 1233. Then, it matches the number 56, glued in the number 4567, because the number 56 is also present at the end of the subject string

So, we need to limit the numbers, with the assertion \b which represents the zero length position between a word character and a non-word character or the opposite. Therefore, the group (\d+) and the back-reference \1 will be surrounded by two \b assertions, giving the right and final regex , below :

(?s)\b(\d+)\b(?=.+\b\1\b)

Have a good day and enjoy N++ !

Best regards

guy038

Rar Crash

Works perfectly, Thank you very much. :D