Multiword highlight

Vini Dope

Ive got a text #1 file that looks like this:
Xxxxxxxx
Zzzzzzzz
Ccccccc
Vvvvvvv
Bbbbbb
Nnnnnn

And i have another #2 text file with lots of words.
What i want to do is if #2 text file has a word that matches #1 i want it to mark it but also mark the one above. So if search for vvvvvvv it should mark both vvvvvvv and cccccccc.
Also the text file contains 10k+ lines so I can’t do it manually. If notepad++ cant do it is there any other way to do it?

Claudia Frank

@Vini-Dope

first, afaik, npp hasn’t such a function, but what it has is highlighting in second view
which means if you doubleclick a word in first view, the same word do get highlight in second view.
Available under settings->preferences

If this doesn’t suite you, than, theoretically, it is possible to write a python script, which means
you need to install the python script plugin in order to be able to make this work. If you want to
go this way, let us know what exactly nees to be done because it is a huge different if your text looks like

word1
word2
word3
…

or

word1 word2 word3 word4 word5 word6 word7 word8 word9 word10
word11 word12 …

Cheers
Claudia

Scott Sumner

@Vini-Dope :

A variant on @Claudia-Frank 's idea could be to use the Styling feature (see Search menu -> Mark All -> Using 1st Style and related) to highlight all of your words, possibly in different colors. But this is a manual process for each word and doesn’t really take what you want to do to its full extent.

Here’s another idea that also doesn’t go as far as you want but may be useful anyway. With the basic idea inherited from this thread, try the following:

Step 1: Add this to the bottom of text file #2 (temporarily, can be removed later):

===word-list===
Xxxxxxxx
Zzzzzzzz
Ccccccc
Vvvvvvv
Bbbbbb
Nnnnnn

Step 2: Invoke the Mark… feature (Search menu) and set up the following:

Find what zone: (?-i)\<(\w+)\>(?s)(?=.*?^===word-list===$.*?\<\1\>)
Wrap around checkbox: ticked
Search mode radio-button: Regular expression

Step 3: Press the Mark All button

This will highlight in red all of the occurrences of the words from the word list in the larger portion of your text file #2. Again this doesn’t meet your original requirements, which are a bit esoteric, but could prove helpful anyway.

If this (or ANY posting on the Notepad++ Community site) is useful, don’t reply with a “thanks”, simply up-vote ( click the ^ in the ^ 0 v area on the right ).

Vini Dope

@scott-sumner :

I tried what you said, if i use “Find next” it does what it supposed to do and goes through all the words in the word-list. However when I press Find next 3x it gets messed up and just marks everything instead of the correct words…
Same happends with Mark.

Scott Sumner

@Vini-Dope

Both (red)Marking and Find-Next worked fine for me with a small data set that I made up. If your some/all of your data where you see problems with this technique is okay to share, put it up on http://textuploader.com/ and post a link to it here and I’ll have a look.

Vini Dope

Could I share it to you privately?

Scott Sumner

@Vini-Dope

Well, if you can’t share it globally you probably shouldn’t share it with me.

I made a much bigger dataset (100000 lines of random text) and was able to duplicate your bad result (all text selected after the Mark operation). To be honest I don’t know what is going on with that…maybe it is just “too much” for the regex engine…

Here’s something else you can try. Take your word list and replace all of the line endings between the words with the vertical bar: |

You can do this by searching the word list for \R and replacing with |– regular expression search of course.

Take that long string and copy and paste it into the Find-what zone and do a Mark (or a Find) as described in Steps 2 and 3 above. This should find your words in your large document.

Assuming your words-to-find average 10 characters in length, you can in theory use this technique on roughly 200 words (the limit of the Find-what zone is 2046 characters. I did not try that many with my experiment, but in theory at least…