Community
    • Login

    Looking for a plugin that can mark "similar" lines in a text file.

    Scheduled Pinned Locked Moved Notepad++ & Plugin Development
    3 Posts 2 Posters 151 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • LordP666L
      LordP666
      last edited by

      What I’m trying to do is find lines that are similar but NOT identical in a large text file.

      Sort of like this:

      His name is Pablo Escobar, but he lives in
      Pable Escobar is a drug dealer

      I’d like the two lines highlighted or marked in some way.

      The text file has 9800 lines, and I’m simply trying to find similar text on two or more lines.

      I can’t do searches because I don’t know ahead of time what is duplicated, just as I’d have no idea to search for “Pablo Escobar”.

      Can this be done?

      Thanks.

      Terry RT 1 Reply Last reply Reply Quote 0
      • Terry RT
        Terry R @LordP666
        last edited by Terry R

        @LordP666 said in Looking for a plugin that can mark "similar" lines in a text file.:

        Can this be done?

        If you were aware that Notepad++ is a text editor, not a word processor, then it should be fairly obvious that Notepad++ is not the tool to attempt this.

        Firstly there aren’t any built-in functions to “simply” achieve this. Secondly, if you did look at the long list of Plugins I doubt you would find anything there either to achieve it.

        If I HAD to use Notepad++ to try and achieve something like this I would do the following (on a copy of the original file as it is destructive):

        1. Put each word on it’s own line.
        2. Lexicographically order them (ascending).
        3. Perform a series of regular expressions to remove the “auxiliary verbs” (helping verbs) such as is, the, are, in etc.
        4. Count the number of repeats for each remaining word. I actually did this sometime ago. Topic #20905 if you really need to see how it might be done, although that would need adjustment to fit this requirement.
        5. Now formulate a number of regular expressions that would be used on the original file to find those “similar lines” you seek. Note that this isn’t really helping as it is working on single words, not paired words as you mentioned as an example. How does a program know which words are paired anyway, even a word processor?

        Actually all the above steps could be more easily done in Notepad++ if using PythonScript or other similar programming language, but it would still be singular words.

        After reading through all these steps let me suggest (strongly again) that Notepad++ with or without plugins is NOT the application to help you in this.

        Terry

        PS I note that a previous question you posted was again bordering on a need that only a word processor would have, not a text editor. If you really are in the “word processing” support arena then I think you will find Notepad++ is not going to assist you much at all.

        LordP666L 1 Reply Last reply Reply Quote 3
        • LordP666L
          LordP666 @Terry R
          last edited by

          @Terry-R

          Fair enough. I thought I’d give it a shot.

          Thanks for your time and suggestions.

          1 Reply Last reply Reply Quote 0
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors