How I can count how many repeated numbers exist in a text, only counting, not replace or remove.



  • I’m having a hard time trying to find a way to ONLY count how many duplicates exists in a bunch of number, can someone help me? Please!
    ex:
    05 14 13 18 19 20
    25 05 20 17 16 03
    Total repeated:2
    because 05 and 20 appears twice.

    So that what I want to do, simple, I dont want replace or remove ONLY count how many repeated numbers exists.
    Can someone help me? Anyway that I can do this without headaches.
    Thanks.



  • Hello Rar Crash,

    Could you have numbers, present three times or more, in your bunch of numbers ? If NOT ( so your numbers occur once or twice, only ), I’ve got an nice solution that uses, again, a regex expression :

    • Open a new file, or use the CTRL + N shortcut

    • Paste your bunch of numbers, ONLY, in this new file ( it doesn’t matter how this list of numbers is displayed )

    • Open the Find dialog, or use the CTRL + F shortcut

    • Type, in the Find what zone, the following regex (?s)\b(\d+)\b(?=.+\b\1\b)

    • Click on the Count button, or hit the ALT + T shortcut

    Et voilà !!

    Remarks :

    • Remember that the Count function run through all the contents of the current file. So, be aware about additional text, which would NOT be part of your bunch of numbers !

    • The location of the cursor has no importance on the result number of matches

    • The 3 options Wrap around, Match case and . matches newline can be set or unset, without any change, too, on the result


    For instance, the following bunch of numbers, below :

    10 14 13 8 1139 20
    25 05 20 17 16 03
    07 16 8 1139 05 10 07
    

    which could be written, as the unsorted list, below :

    10
    13
    05
    8
    20
    07
    25
    05
    20
    17
    14
    1139
    07
    03
    16
    8
    1139
    10
    16
    

    or, even, as the unsorted single line, below :

    10 14 13 10 16 20 25 05 20 17 03 8 1139 07 16 8 1139 05 07
    

    would, all, produce the result : Count: 7 matches, meaning that seven numbers are present two times, in each group !

    Best Regards,

    guy038

    P.S. :

    It’s a bit late, by now, in France ( 04h05 ). As I said in my precedent post, I’ll explain this regex, tomorrow, after a deserved rest !



  • Hi all,

    I’m back to give you some explanations on the regex, in my previous post

    The main ideas, about that regex, are :

    • Match the first occurrence of a duplicated number

    • Verify that, further on, in the current file, a second occurrence of that exact number does exist

    • Keep the position of the regex engine scan, just after the last character of the first occurrence of the matched number, for later searches.


    I started with the simple regex (\d+).+\1. Unfortunately, this regex can detect duplicate numbers on a same line, only, because the dot character does NOT match the End of Line characters ( as \r and \n )

    So, as the location of the two numbers may occur on different lines, I improved the regex to the form (?s)(\d+).+\1, by adding, the modifier (?s), which means that the dot character matches everything. For instance, the regex (?s).+ would match all the contents of a file, as they were, all, in a single line !

    Now let’s imagine we apply our regex (?s)(\d+).+\1, against the bunch of numbers, below :

    27 13 00 46 55
    88 99 13 99 46
    

    It, first, matches from the number 13, in the first line till the second number 13 in the second line. A second click on the Find next button would not find anything else, although the numbers 46 and 99 are, obviously, duplicate numbers. Unfortunately, when the regex engine reaches the second number 13, the first numbers 46 and 99 are over !

    So, we MUST keep the position of the regex engine just, after the first number 13, in order to get other matches, further on. And easy way to do it is to use a look-ahead. Thus, our regex becomes (?s)(\d+)(?=.+\1). Remember that the condition, in the look-ahead, must be verified to match the entire regex. However, the match of the look-around is NEVER part of the regex. So, this time, the regex matches, successively, the 3 numbers 13, 46 and 99, only :-)

    Nice ! But there’s still a problem. Let’s run our regex (?s)(\d+)(?=.+\1) against this other bunch of numbers, below :

    89 23 4567 00 1233 56
    

    You’ll first match the number 23, because it also occurs inside the number 1233. Then, it matches the number 56, glued in the number 4567, because the number 56 is also present at the end of the subject string

    So, we need to limit the numbers, with the assertion \b which represents the zero length position between a word character and a non-word character or the opposite. Therefore, the group (\d+) and the back-reference \1 will be surrounded by two \b assertions, giving the right and final regex , below :

    (?s)\b(\d+)\b(?=.+\b\1\b)

    Have a good day and enjoy N++ !

    Best regards

    guy038



  • Works perfectly, Thank you very much. :D


Log in to reply