Multiword highlight



  • Ive got a text #1 file that looks like this:
    Xxxxxxxx
    Zzzzzzzz
    Ccccccc
    Vvvvvvv
    Bbbbbb
    Nnnnnn

    And i have another #2 text file with lots of words.
    What i want to do is if #2 text file has a word that matches #1 i want it to mark it but also mark the one above. So if search for vvvvvvv it should mark both vvvvvvv and cccccccc.
    Also the text file contains 10k+ lines so I can’t do it manually. If notepad++ cant do it is there any other way to do it?



  • @Vini-Dope

    first, afaik, npp hasn’t such a function, but what it has is highlighting in second view
    which means if you doubleclick a word in first view, the same word do get highlight in second view.
    Available under settings->preferences

    If this doesn’t suite you, than, theoretically, it is possible to write a python script, which means
    you need to install the python script plugin in order to be able to make this work. If you want to
    go this way, let us know what exactly nees to be done because it is a huge different if your text looks like

    word1
    word2
    word3

    or

    word1 word2 word3 word4 word5 word6 word7 word8 word9 word10
    word11 word12 …

    Cheers
    Claudia



  • @Vini-Dope :

    A variant on @Claudia-Frank 's idea could be to use the Styling feature (see Search menu -> Mark All -> Using 1st Style and related) to highlight all of your words, possibly in different colors. But this is a manual process for each word and doesn’t really take what you want to do to its full extent.

    Here’s another idea that also doesn’t go as far as you want but may be useful anyway. With the basic idea inherited from this thread, try the following:

    Step 1: Add this to the bottom of text file #2 (temporarily, can be removed later):

    ===word-list===
    Xxxxxxxx
    Zzzzzzzz
    Ccccccc
    Vvvvvvv
    Bbbbbb
    Nnnnnn
    

    Step 2: Invoke the Mark… feature (Search menu) and set up the following:

    Find what zone: (?-i)\<(\w+)\>(?s)(?=.*?^===word-list===$.*?\<\1\>)
    Wrap around checkbox: ticked
    Search mode radio-button: Regular expression

    Step 3: Press the Mark All button

    This will highlight in red all of the occurrences of the words from the word list in the larger portion of your text file #2. Again this doesn’t meet your original requirements, which are a bit esoteric, but could prove helpful anyway.

    If this (or ANY posting on the Notepad++ Community site) is useful, don’t reply with a “thanks”, simply up-vote ( click the ^ in the ^ 0 v area on the right ).



  • @scott-sumner :

    I tried what you said, if i use “Find next” it does what it supposed to do and goes through all the words in the word-list. However when I press Find next 3x it gets messed up and just marks everything instead of the correct words…
    Same happends with Mark.



  • @Vini-Dope

    Both (red)Marking and Find-Next worked fine for me with a small data set that I made up. If your some/all of your data where you see problems with this technique is okay to share, put it up on http://textuploader.com/ and post a link to it here and I’ll have a look.



  • Could I share it to you privately?



  • @Vini-Dope

    Well, if you can’t share it globally you probably shouldn’t share it with me.

    I made a much bigger dataset (100000 lines of random text) and was able to duplicate your bad result (all text selected after the Mark operation). To be honest I don’t know what is going on with that…maybe it is just “too much” for the regex engine…

    Here’s something else you can try. Take your word list and replace all of the line endings between the words with the vertical bar: |

    You can do this by searching the word list for \R and replacing with |– regular expression search of course.

    Thus you’ll end up with something like this: Xxxxxxxx|Zzzzzzzz|Ccccccc|Vvvvvvv|Bbbbbb|Nnnnnn

    Take that long string and copy and paste it into the Find-what zone and do a Mark (or a Find) as described in Steps 2 and 3 above. This should find your words in your large document.

    Assuming your words-to-find average 10 characters in length, you can in theory use this technique on roughly 200 words (the limit of the Find-what zone is 2046 characters. I did not try that many with my experiment, but in theory at least…


Log in to reply