Community
    • Login

    Count the occurences of each line

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    10 Posts 3 Posters 3.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • ?
      A Former User
      last edited by

      Hello, I don’t know it this is possible with Notepad++, but this is what I want to do :

      imagine that you have a text file which contains :

      Hello my friend
      Ho
      A
      XT
      Hello my friend
      A
      Ha

      so I want to count the occurences of each line, so the output I want woud be like that :

      Hello my friend 2
      Ho 1
      A 2
      XT 1
      Ha 1

      Do you think it’s possible?

      Thank you !

      1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones
        last edited by

        @JSE-Faucet said:

        I don’t know it this is possible with Notepad++

        Unfortunately, not solely with Notepad++. When paired with one of the scripting-language plugins (PythonScript , LuaScript, or “jN Notepad++ Plugin”), you could have a script to do that inside Notepad++. (Or you could use any of those languages, or other programming languages, to do the same thing without Notepad++'s help.)

        1 Reply Last reply Reply Quote 2
        • PeterJonesP
          PeterJones
          last edited by PeterJones

          I decided since it had been a while since I’d last tested my PythonScript chops, and I had a few minutes, I would see if I could implement it. This replicates your results

          # encoding=utf-8
          """in response to https://notepad-plus-plus.org/community/topic/17744/"""
          from Npp import *
          
          def forum_post17744_FunctionName():
              """
              this uses each line as a key in a dictionary, to count how many entries;
              to preserve order, it also stores the key in a list
              """
          
              # initialize dictionary and ordered list
              count = dict()
              order = list()
          
              # parse the active editor's text
              for lnum in range(editor.getLineCount()):
                  editor.gotoLine(lnum)
                  key = editor.getCurLine().rstrip()
                  if key in count:
                      count[key] = count[key] + 1
                  else:
                      count[key] = 1
                      order.append(key)
          
              # make the changes:
              #   delete old contents,
              #   insert key + count + EOL for each of the unique lines, in original order
              editor.beginUndoAction()
              editor.selectAll()
              editor.deleteBack()
              for key in order:
                  editor.addText("{} {}\r\n".format(key, count[key]))
              editor.endUndoAction()
          
          
          if __name__ == '__main__': forum_post17744_FunctionName()
          

          To use this,

          1. Install PythonScript plugin using Plugins > Plugins Admin
          2. Plugins > Python Script > New Script, give it a name like CountUniqueLines.py
          3. Paste it the code and save
          4. Open your example file in the active editor pane
          5. Plugins > Python Script > Scripts > CountUniqueLines

          Enjoy your free code-writing service.

          1 Reply Last reply Reply Quote 3
          • ?
            A Former User
            last edited by

            Are you sure that it will work with very big files ? (90MB)

            Alan KilbornA 1 Reply Last reply Reply Quote 0
            • Alan KilbornA
              Alan Kilborn @A Former User
              last edited by

              @JSE-Faucet

              work with very big files

              One way to find out…

              If Notepad++ itself can handle the file (and it should be fine, 90MB is not THAT big), such a script can handle processing it…

              1 Reply Last reply Reply Quote 0
              • ?
                A Former User
                last edited by

                Sadly, I can’t use PythonScript on my computer… I don’t know why though. I had to install it with their installer because it wasn’t in the list. Now it is installed but it does not even shows up on NPP :(

                1 Reply Last reply Reply Quote 0
                • PeterJonesP
                  PeterJones
                  last edited by

                  @JSE-Faucet said:

                  Are you sure that it will work with very big files ? (90MB)

                  No, I am not sure; though, as @Alan-Kilborn said, 90Mb is pretty small, in the modern scheme of things. Given your problem statement, I thought this was sufficient for getting through a one-time need. (And really, even that much is going above and beyond, since this is a forum about Notepad++, not a free code writing service).

                  I have ideas on how the script could be modified to handle what you want. But it would take more investment of time for me, and I’ve already fulfilled my desire to see how I would give this a quicky implementation.

                  And actually, I remembered that nearly this functionality already exists. The Linux command uniq -c will almost get you what you want. sort infile.txt | uniq -c will put all similar lines next to each other, and then count duplicates. However, it will not preserve your original order. If that’s of interest to you, see http://gnuwin32.sourceforge.net/packages/coreutils.htm for a windows-implementation.

                  1 Reply Last reply Reply Quote 1
                  • PeterJonesP
                    PeterJones
                    last edited by

                    @JSE-Faucet said:

                    Sadly, I can’t use PythonScript on my computer… I don’t know why though. I had to install it with their installer because it wasn’t in the list. Now it is installed but it does not even shows up on NPP :(

                    Oh, right, I forgot it doesn’t install quite-right out of the box for 7.6.x and newer. @Meta-Chuh wrote a Guide to Installing PythonScript Plugin on Notepad++ 7.6.3 and above.

                    1 Reply Last reply Reply Quote 1
                    • PeterJonesP
                      PeterJones
                      last edited by

                      @PeterJones said:

                      I’ve already fulfilled my desire to see

                      I guess I hadn’t. I wanted to see if my ideas worked. They did.

                      I removed the order[] array, and just re-parsed the file to maintain order. I also changed the key from being the line of text (which in real data would typically be 20-80 characters, not just the dozen or less in the example) to using a crc32 of the line (so every line is mapped to a 32bit = 4 byte key). (*: 32 bits is too small for guaranteed-collision-free, but it’s likely good enough; if not, one could probably use a trick to do two different 32-bit hashes – like maybe crc32(txt)+crc32(txt.reverse), assuming the crc of the reverse text is different; I think so, but am not sure.) On the first pass, it builds the dictionary of crc32:count pairs; on the second pass, it either adds the count to the end of the line, or it deletes the line if that key-count has already been used.

                      IyBlbmNvZGluZz11dGYtOA0KIiIiaW4gcmVzcG9uc2UgdG8gaHR0cHM6Ly9ub3Rl
                      cGFkLXBsdXMtcGx1cy5vcmcvY29tbXVuaXR5L3RvcGljLzE3NzQ0LyIiIg0KZnJv
                      bSBOcHAgaW1wb3J0ICoNCmltcG9ydCB6bGliDQoNCmNvbnNvbGUuY2xlYXIoKQ0K
                      DQpkZWYgZm9ydW1fcG9zdDE3NzQ0X0Z1bmN0aW9uTmFtZSgpOg0KICAgICIiIg0K
                      ICAgIHRoaXMgdXNlcyBhIGhhc2ggb2YgZWFjaCBsaW5lIGFzIGEga2V5IGluIGEg
                      ZGljdGlvbmFyeSwgdG8gY291bnQgaG93IG1hbnkgZW50cmllczsNCiAgICB0byBw
                      cmVzZXJ2ZSBvcmRlciwgaXQgd2lsbCBydW4gdGhyb3VnaCB0aGUgZmlsZSBhIHNl
                      Y29uZCB0aW1lDQogICAgKHRoaXMgc2F2ZXMgbWVtb3J5IG9mIHdob2xlLWxpbmUg
                      a2V5cywgYW5kIGFuIGFycmF5IHRvIGhvbGQgb3JkZXIpDQogICAgIiIiDQoNCiAg
                      ICAjIGluaXRpYWxpemUgZGljdGlvbmFyeSBhbmQgb3JkZXJlZCBsaXN0DQogICAg
                      Y291bnQgPSBkaWN0KCkNCg0KICAgICMgcGFyc2UgdGhlIGFjdGl2ZSBlZGl0b3In
                      cyB0ZXh0DQogICAgZm9yIGxudW0gaW4gcmFuZ2UoZWRpdG9yLmdldExpbmVDb3Vu
                      dCgpKToNCiAgICAgICAgZWRpdG9yLmdvdG9MaW5lKGxudW0pDQogICAgICAgIGtl
                      eSA9IHpsaWIuY3JjMzIoZWRpdG9yLmdldEN1ckxpbmUoKS5yc3RyaXAoKSkgJiAw
                      eEZGRkZGRkZGDQogICAgICAgIGlmIGtleSBpbiBjb3VudDoNCiAgICAgICAgICAg
                      IGNvdW50W2tleV0gPSBjb3VudFtrZXldICsgMQ0KICAgICAgICBlbHNlOg0KICAg
                      ICAgICAgICAgY291bnRba2V5XSA9IDENCg0KICAgICMgbWFrZSB0aGUgY2hhbmdl
                      czoNCiAgICBlZGl0b3IuYmVnaW5VbmRvQWN0aW9uKCkNCiAgICBsbnVtID0gMA0K
                      ICAgIHdoaWxlIGxudW0gPCBlZGl0b3IuZ2V0TGluZUNvdW50KCk6ICAgICAjIHVz
                      ZSBhIHdoaWxlIGxvb3AgcmF0aGVyIHRoYW4gZm9yIGxvb3AsIHNvIEkgY2FuIGNo
                      b29zZSBfbm90XyB0byBhZHZhbmNlIGxudW0gYWZ0ZXIgaSBkZWxldGUgdGhlIHJv
                      dyAoYmVjYXVzZSAibmV4dCIgcm93IHdpbGwgaGF2ZSBzYW1lIGxudW0gYXMgdGhl
                      IGRlbGV0ZWQgcm93KQ0KICAgICAgICBlZGl0b3IuZ290b0xpbmUobG51bSkNCiAg
                      ICAgICAga2V5ID0gemxpYi5jcmMzMihlZGl0b3IuZ2V0Q3VyTGluZSgpLnJzdHJp
                      cCgpKSAmIDB4RkZGRkZGRkYNCiAgICAgICAgaWYga2V5IGluIGNvdW50Og0KICAg
                      ICAgICAgICAgI2NvbnNvbGUud3JpdGUoIns6MDEwWH18e318e31cbiIuZm9ybWF0
                      KGtleSwgZWRpdG9yLmdldEN1ckxpbmUoKS5yc3RyaXAoKSwgY291bnRba2V5XSkp
                      DQogICAgICAgICAgICBlZGl0b3IubGluZUVuZCgpDQogICAgICAgICAgICBlZGl0
                      b3IuYWRkVGV4dCgiIHt9Ii5mb3JtYXQoY291bnRba2V5XSkpDQogICAgICAgICAg
                      ICBkZWwgY291bnRba2V5XSAgIyBkb24ndCB3YW50IHRvIGhhdmUgZHVwbGljYXRl
                      cywgc28gcmVtb3ZlIHRoZSBrZXkgdG8gaW5kaWNhdGUgSSdtIGRvbmUNCiAgICAg
                      ICAgICAgIGxudW0gPSBsbnVtICsgMQ0KICAgICAgICBlbHNlOg0KICAgICAgICAg
                      ICAgI2NvbnNvbGUud3JpdGUoIns6MDEwWH18e318e31cbiIuZm9ybWF0KGtleSwg
                      ZWRpdG9yLmdldEN1ckxpbmUoKS5yc3RyaXAoKSwgIk5FRUQgVE8gREVMRVRFIExJ
                      TkUiKSkNCiAgICAgICAgICAgIGVkaXRvci5saW5lRGVsZXRlKCkNCiAgICBlZGl0
                      b3IuZW5kVW5kb0FjdGlvbigpDQoNCmlmIF9fbmFtZV9fID09ICdfX21haW5fXyc6
                      IGZvcnVtX3Bvc3QxNzc0NF9GdW5jdGlvbk5hbWUoKQ0K
                      

                      (Obfuscated to avoid a spoiler for the solution. If you need a hint, that’s base64 encoded, and Notepad++ usually ships with MIME Tools plugin…)

                      Alan KilbornA 1 Reply Last reply Reply Quote 1
                      • Alan KilbornA
                        Alan Kilborn @PeterJones
                        last edited by Alan Kilborn

                        @PeterJones

                        OMG. Peter, I think this Community is like a drug for you…or maybe for more than just you.

                        :)

                        1 Reply Last reply Reply Quote 3
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors