Community
    • Login

    Find paragraph of X words containing multiple keywords ?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    22 Posts 4 Posters 5.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn @guy038
      last edited by

      @guy038

      Thank you for the clarifications.

      1 Reply Last reply Reply Quote 0
      • n_antiyouN
        n_antiyou @guy038
        last edited by

        @guy038 This is it! It works! Ahahah

        I am not sure if I did correct. To be honest, I have understood about 10% of what you guys wrote. But I copied

        (?s-i)(?=(\W+\w+){0,29}\W+DOG)(?=(?1){0,29}\W+CAT)(?=(?1){0,29}\W+RABBIT)(?1){30}

        substituing DOG/RABBIT/CAT with 3 other keywords, and notepad++ found a portion of the text 30 words long containing all 3.

        This is exactly what I was looking for.

        Now, I have only 2 questions left:

        1) If I want to modify the size of the portion of the text I want to find ( lets say from 30 to 50 ), would it look like this:

        " (?s-i)(?=(\W+\w+){0,49}\W+guidato)(?=(?1){0,49}\W+parte)(?=(?1){0,49}\W+contributi)(?1){50} " ?

        2 ) What is the expression to achieve the same result, but with 4 and 5 keywords instead of 3? ( I know I could just ask what’s the pattern to follow to add more keywords… if you want you can write it down… I am scared that I won’t understand it tho )

        PeterJonesP 1 Reply Last reply Reply Quote 0
        • n_antiyouN
          n_antiyou
          last edited by n_antiyou

          Oh, and one more ( sorry )

          3 ) I see that notepad, when entering the expression and hitting "find ", it brings me to the proper place where the portion of the text is, and then enlightens it in grey. Is there a way to make so it also englithens the 3 keywords INSIDE, with a different color? ( any color, even the same color for all 3 keywords )
          You may look at the picture I sent above as an example.

          1 Reply Last reply Reply Quote 0
          • PeterJonesP
            PeterJones @n_antiyou
            last edited by

            @n_antiyou

            1. Yep, that’s right.
            2. each one of the (?=(\W+\w+){0,49}\W+guidato) terms applies to one of your required words. You’ll notice right now, there are three of those terms, each with one of your required words. You just need to add more of the same terms but with the new words.
            3. with a different color? not in the same regular expression, sorry
            1 Reply Last reply Reply Quote 1
            • n_antiyouN
              n_antiyou
              last edited by

              Perfect, so assuming 6 keywords and 300 words as size, it should look like this:

              (?s-i)(?=(\W+\w+){0,299}\W+word1)(?=(\W+\w+){0,299}\W+word2)(?=(\W+\w+){0,299}\W+word3)(?=(\W+\w+){0,299}\W+word4)(?=(\W+\w+){0,299}\W+word5)(?=(\W+\w+){0,299}\W+word6)(?1){300}

              Correct?

              I’m starting to think this is a bit too complex tho. Not the expression per se, since once I understand how it works I can make new ones on my own, but the process takes time. Aren’t there programs that do this kind of research with a friendly UI?
              Maybe I could find people on fiverr to develop an extention of google chrome that does this kind of research, so that it would also work on a PDF without converting to txt alltogether.

              Alan KilbornA 1 Reply Last reply Reply Quote 0
              • guy038G
                guy038
                last edited by

                Hi, @n_antiyou,

                Give me some minutes ! your last regex can be simplified ;-))

                BR

                guy038

                1 Reply Last reply Reply Quote 0
                • Alan KilbornA
                  Alan Kilborn @n_antiyou
                  last edited by

                  @n_antiyou said in Find paragraph of X words containing multiple keywords ?:

                  Aren’t there programs that do this kind of research with a friendly UI?

                  Are there? I guess you’d have to go and find them then.

                  Maybe I could find people on fiverr to develop an extention of google chrome that does this kind of research, so that it would also work on a PDF without converting to txt alltogether.

                  Are there people standing by just to do this sort of thing?
                  That’s nice if so.
                  Maybe they can field some of the oddball need regex questions we get asked here.

                  1 Reply Last reply Reply Quote 0
                  • n_antiyouN
                    n_antiyou
                    last edited by

                    There might be, there’s people that do all sorts of things on fiverr it seems. XD

                    1 Reply Last reply Reply Quote 0
                    • guy038G
                      guy038
                      last edited by guy038

                      Hello, @n_antiyou, @peterjones, @alan-kilborn and All

                      @n_antiyou,

                      For 6 keywords, you can use that mono-line regex , below, in free-spacing mode, which enables you to place any space within this long regex for a better readability !

                      SEARCH / MARK (?xs-i) (?=(\W+ \w+){0,299} \W+ Word_1) (?=(?1){0,299} \W+ Word_2) (?=(?1){0,299} \W+ Word_3) (?=(?1){0,299} \W+ Word_4) (?=(?1){0,299} \W+ Word_5) (?=(?1){0,299} \W+ Word_6) (?1){300}


                      Now, you may use the Search > Mark All > Using #th style in order to highlight your keywords with a specific color. Note that, for your 6th keyword, you’ll have to cheat a bit by applying two successive highlightings to the same word ! Just try to mix two styles ;-))

                      Best Regards,

                      guy038

                      P.S. :

                      Note that the syntax (\W+ \w+), near the beginning of the regex, defines the group 1 containing the sub-regex \W+\w+, which is re-used, further on, thanks to the simple syntax (?1)

                      You’ll find some links to improve yourself in regexes here !

                      1 Reply Last reply Reply Quote 2
                      • guy038G
                        guy038
                        last edited by guy038

                        Hi, @n_antiyou and All,

                        As I said in my previous post, you may mix some styles, from the 5 styles, available by default, to get other colors, in order to color all your keywords !

                        Refer to this post by @Claudia-Frank, who, unfortunately, is no longer active on this forum ! Her contribution was quite important and she provided quantity of excellent Python scripts, too ! Let’s wish her the best and good coding moments ;-))

                        https://community.notepad-plus-plus.org/post/27621


                        With the help of the NppQCP plugin ( Quick Color Plugin ), I built up a Word image which recapitulates the main style combinations, with significant colors and their RGB coordinates


                        3a6919c6-acd7-4a43-80fe-b56305397952-image.png

                        Best Regards,

                        guy038

                        1 Reply Last reply Reply Quote 3
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors