Looking for a plugin that can mark "similar" lines in a text file.
-
What I’m trying to do is find lines that are similar but NOT identical in a large text file.
Sort of like this:
His name is Pablo Escobar, but he lives in
Pable Escobar is a drug dealerI’d like the two lines highlighted or marked in some way.
The text file has 9800 lines, and I’m simply trying to find similar text on two or more lines.
I can’t do searches because I don’t know ahead of time what is duplicated, just as I’d have no idea to search for “Pablo Escobar”.
Can this be done?
Thanks.
-
@LordP666 said in Looking for a plugin that can mark "similar" lines in a text file.:
Can this be done?
If you were aware that Notepad++ is a text editor, not a word processor, then it should be fairly obvious that Notepad++ is not the tool to attempt this.
Firstly there aren’t any built-in functions to “simply” achieve this. Secondly, if you did look at the long list of Plugins I doubt you would find anything there either to achieve it.
If I HAD to use Notepad++ to try and achieve something like this I would do the following (on a copy of the original file as it is destructive):
- Put each word on it’s own line.
- Lexicographically order them (ascending).
- Perform a series of regular expressions to remove the “auxiliary verbs” (helping verbs) such as
is, the, are, in
etc. - Count the number of repeats for each remaining word. I actually did this sometime ago. Topic #20905 if you really need to see how it might be done, although that would need adjustment to fit this requirement.
- Now formulate a number of regular expressions that would be used on the original file to find those “similar lines” you seek. Note that this isn’t really helping as it is working on single words, not paired words as you mentioned as an example. How does a program know which words are paired anyway, even a word processor?
Actually all the above steps could be more easily done in Notepad++ if using PythonScript or other similar programming language, but it would still be singular words.
After reading through all these steps let me suggest (strongly again) that Notepad++ with or without plugins is NOT the application to help you in this.
Terry
PS I note that a previous question you posted was again bordering on a need that only a word processor would have, not a text editor. If you really are in the “word processing” support arena then I think you will find Notepad++ is not going to assist you much at all.
-