Community
    • Login

    List frequency of duplicates

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    12 Posts 3 Posters 3.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Muhammad KhanM
      Muhammad Khan
      last edited by

      I currently have a list of words with one word per line, I’d like to get a list of how many times each word was duplicated.

      1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones
        last edited by

        Notepad++ is a great tool, but some things are easier to solve with another tool.

        You can easily do this in most of the “scripting” lanugages, such as Perl, Python, or Lua. If you have Perl (see Strawberry Perl for an easy-to-install version for Windows), save this file as countWords.pl

        #/usr/bin/perl
        use warnings;
        use strict;
        my %count;
        
        while($_ = <>) {
            chomp;
            ++$count{$_};
        }
        
        printf "%-20s => %d\n", $_, $count{$_} for sort { $count{$b} <=> $count{$a} or $a cmp $b } keys %count;
        

        Then run perl countWords.pl input.txt (where input.txt is the file, with one word per line), and it will give you the list of words, sorted by frequency then alphabetically

        Or, if you have some of the linux-like tools – using GnuWin32, or the git-bash shell, or cygwin, or the new WSL (Windows Subsystem for Linux) – then it could be done with a one-liner: sort list.txt | uniq -c | sort -nr (take the file, sort it so the same words are next to each other, output each line with its unique count, and sort it in reverse numerical order).

        Actually, if you grabbed the sort, uniq, and wc from GnuWin32 and had them in your path (before c:\windows\system32\ in your path, otherwise it will use windows’ sort, not gnu sort), then you could run that through the Notepad++ Run > Run, using cmd.exe /K "sort ^"$(FULL_CURRENT_PATH)^" | uniq -c | sort -nr".

        Alternatively to Perl or GnuWin32 examples shown here, you could do it with either Python or Lua inside Notepad++ if you have the PythonScript or LuaScript plugins for Notepad++.

        1 Reply Last reply Reply Quote 3
        • Muhammad KhanM
          Muhammad Khan
          last edited by

          Thank You very much. Do you know if there is any way to count the no. of times a phrase was duplicated. For eg. The list below automatically get a count without inputting a phrase to look for. Please note that a username appears in front of the phrase which I would not like to carry over.

          BELL TWIN DUO CORDLESS PHONES- AIR-02 2HANDSETS *2
          History Of Manchester United DVD *2

          NEWInstant Water Heater Shower Head
          Comment:great er!!! no hassles!!!
          rated on 07 Jun 2016
          stanton261 BELL TWIN DUO CORDLESS PHONES- AIR-02 2HANDSETS
          Comment:great er!!! no hassles!!!
          rated on 07 Jun 2016
          Rarebit314 Kenwood Prospero Chef Kitchen 900W Mixer + ATTACHMENTS
          Comment:great er!!! no hassles!!!
          rated on 07 Jun 2016
          Yvie240182 Manchester United - The Official History 1878 - 2008: 2 DVD Set
          Comment:great er!!! no hassles!!!
          rated on 07 Jun 2016
          ubombo57 History Of Manchester United DVD
          Comment:great er!!! no hassles!!!
          rated on 07 Jun 2016
          sanjayramroop4 History Of Manchester United DVD
          Comment:great er!!! no hassles!!!
          rated on 07 Jun 2016
          TonyS129 BELL TWIN DUO CORDLESS PHONES- AIR-02 2HANDSETS
          Comment:great er!!! no hassles!!!
          rated on 07 Jun 2016

          Scott SumnerS 1 Reply Last reply Reply Quote 0
          • Scott SumnerS
            Scott Sumner @Muhammad Khan
            last edited by Scott Sumner

            @Muhammad-Khan

            Danger…warning…GETTING OFF-TOPIC…let’s get back to talking about Notepad++…

            Muhammad KhanM 1 Reply Last reply Reply Quote 0
            • Muhammad KhanM
              Muhammad Khan @Scott Sumner
              last edited by

              @Scott-Sumner This is about using notepad++ how am i getting off topic

              Scott SumnerS 1 Reply Last reply Reply Quote 0
              • Scott SumnerS
                Scott Sumner @Muhammad Khan
                last edited by Scott Sumner

                @Muhammad-Khan

                I don’t see anything about Notepad++ in this entire thread. What I see are data-content manipulations that don’t involve Notepad++. @PeterJones provided you a non-Notepad++ solution (Perl-based). If you want to extend/change that, then super, but please take that to somewhere else (e.g. a Perl help site), because here we talk about Notepad++.

                @PeterJones said it well: Notepad++ is a great tool, but some things are easier to solve with another tool. And the problem is, we don’t get into in-depth conversations here about other tools.

                If you still don’t see why this thread isn’t Notepad++ related, please have a look at “baking cookies” in this FAQ.

                Muhammad KhanM 1 Reply Last reply Reply Quote 0
                • Muhammad KhanM
                  Muhammad Khan @Scott Sumner
                  last edited by

                  @Scott-Sumner I asked for a solution. I can’t control the responses of others. So review your own statements before you put nonsense up. And don’t talk here if you’re not providing a solution you’re just wasting my time and extending this thread for no reason.

                  Scott SumnerS 1 Reply Last reply Reply Quote -2
                  • Scott SumnerS
                    Scott Sumner @Muhammad Khan
                    last edited by

                    @Muhammad-Khan

                    I can’t control the responses of others.

                    Note that I am not negative about @PeterJones 's response at all. If you like what he proposed, run with it. But since it isn’t about Notepad++, take further discussion about it elsewhere if you need to. Ideally, you should spend some time learning about the solution provided if it is an unfamiliar technology to you, so that you can solve your own problems, such as extending it to do something different.

                    Do you know if there is any way to count the no. of times a phrase was duplicated

                    In the event that this question is back to Notepad++ and isn’t related to the @PeterJones Perl solution to the first thing you asked (I really have no idea), then you should have created a new discussion thread for this. However, let’s assume that it is a question about Notepad++, which will be gladly answered here.

                    You can certainly count the number of times a phrase is duplicated. In the Find window there is a Count button which will provide this functionality. Put your search phrase in the Find what zone and press the Count button and the number of occurrences that match will appear in the Find window’s status bar. In your case I guess the Find what would be HANDSETS.

                    1 Reply Last reply Reply Quote 2
                    • Muhammad KhanM
                      Muhammad Khan
                      last edited by

                      It’s still an open question I’m not further discussing it so don’t say warning getting off topic. And read the thread carefully I said without inputting a statement.

                      Scott SumnerS 1 Reply Last reply Reply Quote 0
                      • Scott SumnerS
                        Scott Sumner @Muhammad Khan
                        last edited by

                        @Muhammad-Khan said:

                        I’m not further discussing it

                        Why not, if you think it is something Notepad++ might be able to do? Your choice, though, but I think you may have developed an “attitude” that will preclude further discussion.

                        And read the thread carefully

                        Well, we try to but often we get posters here who aren’t that great at expressing their needs in English. So we try to make our best interpretation; sometimes things are missed/misinterpreted.

                        without inputting a statement

                        Well, you have to be a bit more precise about what you want. Without “inputting a statement”, do you want Notepad++ to read your mind about what you want to search for? Or is it supposed to search for every possible combination of characters or words or string of words that appear elsewhere in your document? We are definitely willing to assist with the things that Notepad++ can do for you, if you make that need clear.

                        1 Reply Last reply Reply Quote 0
                        • Muhammad KhanM
                          Muhammad Khan
                          last edited by

                          You’re an absolute fucking retard

                          Scott SumnerS 1 Reply Last reply Reply Quote -7
                          • Scott SumnerS
                            Scott Sumner @Muhammad Khan
                            last edited by

                            @Muhammad-Khan

                            You’re an absolute f***ing retard

                            Maybe…but if so it doesn’t make sense that I have far more “reputation points” on this site than anyone else. Hmmmm…

                            1 Reply Last reply Reply Quote 2
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors