Community
    • Login

    Parallel searching for 2 Names

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    11 Posts 5 Posters 2.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • EkopalypseE
      Ekopalypse @Erich Siebenhaar
      last edited by

      @erich-siebenhaar

      This is something that can normally be achieved with regular expressions (regex).
      Something like name_one.{1000}name_two means thatn search for name_one followed by 1000 characters and then name_two must appear.
      But regex can get really complicated.
      You can find more information here.

      1 Reply Last reply Reply Quote 2
      • Alan KilbornA
        Alan Kilborn @Erich Siebenhaar
        last edited by

        @erich-siebenhaar

        Building on what @Ekopalypse said…

        I’d think (?s)regex1.{0,1000}?regex2 meets the need (let us start with the “within 1000 characters” spec).

        Can you show us how it doesn’t meet the need?
        Sample data is certainly welcome, and probably helps.

        Erich SiebenhaarE 1 Reply Last reply Reply Quote 3
        • Neil SchipperN
          Neil Schipper @Erich Siebenhaar
          last edited by

          @erich-siebenhaar:

          search for 2 Names (or expressions) at the same time

          These are all different things one could try to match (limited by the range):

          • either John or Mary
          • <John><arbitrary text><Mary>
          • either <John><arbitrary text><Mary> or <Mary><arbitrary text><John>
          • for <John><text><Mary>, if we encounter <John><text><John><text><Mary>, the whole match could start from the first or the last occurrence of <John>

          Each would require its own expression.

          Alan KilbornA 1 Reply Last reply Reply Quote 1
          • Alan KilbornA
            Alan Kilborn @Neil Schipper
            last edited by

            @neil-schipper

            Don’t give the OP ideas about how to upscope his need! :-)

            1 Reply Last reply Reply Quote 2
            • Erich SiebenhaarE
              Erich Siebenhaar @Alan Kilborn
              last edited by

              @alan-kilborn
              Thank you!!
              This does in fact solve the problem.
              First I thought it does not, because in this text:

              1 Botvinnik,Mikhail Moiseevich * ½ ½ ½ ½ 1 0 ½ ½ 1 1 1 1 1 1 1 11.0/15 71.75
              2 Smyslov,Vassily Vasilievich ½ * ½ ½ ½ ½ ½ ½ ½ 1 1 1 1 1 1 1 11.0/15 71.50
              3 Taimanov,Mark Evgenievich ½ ½ * ½ 1 1 ½ ½ ½ ½ ½ 1 ½ 1 1 1 10.5/15
              4 Gligoric,Svetozar ½ ½ ½ * 0 ½ ½ ½ 1 ½ ½ 1 1 1 1 1 10.0/15
              5 Bronstein,David Ionovich ½ ½ 0 1 * ½ ½ ½ ½ ½ 1 ½ 1 ½ 1 1 9.5/15
              6 Najdorf,Miguel 0 ½ 0 ½ ½ * ½ ½ 1 ½ ½ ½ 1 1 1 1 9.0/15
              7 Keres,Paul 1 ½ ½ ½ ½ ½ * 1 0 ½ 0 ½ ½ ½ 1 1 8.5/15 61.25
              8 Pachman,Ludek ½ ½ ½ ½ ½ ½ 0 * ½ ½ ½ ½ ½ 1 1 1 8.5/15 56.00
              9 Unzicker,Wolfgang ½ ½ ½ 0 ½ 0 1 ½ * 1 ½ ½ ½ 1 0 1 8.0/15 56.25
              10 Stahlberg,Anders Gideon Tom 0 0 ½ ½ ½ ½ ½ ½ 0 * ½ ½ 1 1 1 1 8.0/15 48.25
              11 Szabo,Laszlo 0 0 ½ ½ 0 ½ 1 ½ ½ ½ * ½ ½ ½ 0 ½ 6.0/15
              12 Padevsky,Nikola Bochev 0 0 0 0 ½ ½ ½ ½ ½ ½ ½ * 0 ½ 1 ½ 5.5/15 34.75
              13 Uhlmann,Wolfgang 0 0 ½ 0 0 0 ½ ½ ½ 0 ½ 1 * 1 ½ ½ 5.5/15 32.50
              14 Ciocaltea,Victor 0 0 0 0 ½ 0 ½ 0 0 0 ½ ½ 0 * 1 ½ 3.5/15
              15 Sliwa,Bogdan 0 0 0 0 0 0 0 0 1 0 1 0 ½ 0 * ½ 3.0/15
              16 Golombek,Harry 0 0 0 0 0 0 0 0 0 0 ½ ½ ½ ½ ½ * 2.5/15

              (?s)Botvinnik.{0,1000}?Golombek

              does not find the two players, but

              (?s)Botvinnik.{0,1000}?Uhlmann works.

              Why do i need (?s)Botvinnik.{0,1800}?Golombek to find the expressions, even though the lines are less than 80 characters long?

              Anyway, I will search for 2000 and find everything I need.

              Alan KilbornA 1 Reply Last reply Reply Quote 1
              • Alan KilbornA
                Alan Kilborn @Erich Siebenhaar
                last edited by

                @erich-siebenhaar said in Parallel searching for 2 Names:

                Why do i need (?s)Botvinnik.{0,1800}?Golombek to find the expressions, even though the lines are less than 80 characters long?

                I your example text, I see that the end of Botvinnik and the start of Golombek are 1092 positions apart. Thus using 1000 instead of 1800 isn’t going to find it.

                Interestingly, however is your use of the UTF-8 multibyte character ½. This character is encoded into 2 bytes each time it occurs.

                If I replace ½ with a single-byte character, e.g. 1, and repeat the search using 1000, it succeeds in finding the match, because now the position difference between the two words are less than 1000.

                Thus it appears that the regex count qualifiers are unaware of multibyte character encoding. :-( I don’t like this… something like .{1000} should match 1000 characters, not 1000 bytes. @guy038 , do you have some comment on this?

                1 Reply Last reply Reply Quote 3
                • guy038G
                  guy038
                  last edited by guy038

                  Hello, @erich-siebenhaar, @ekopalypse, @alan-kilborn and All,

                  Alan, don’t worry ! the regex dot symbol ( . ) does count characters and not bytes ;-))

                  Don’t know which was your current encoding when you tested or it could be a wrong selection !

                  I will consider the text :

                  1 Botvinnik,Mikhail Moiseevich * ½ ½ ½ ½ 1 0 ½ ½ 1 1 1 1 1 1 1 11.0/15 71.75
                  2 Smyslov,Vassily Vasilievich ½ * ½ ½ ½ ½ ½ ½ ½ 1 1 1 1 1 1 1 11.0/15 71.50
                  3 Taimanov,Mark Evgenievich ½ ½ * ½ 1 1 ½ ½ ½ ½ ½ 1 ½ 1 1 1 10.5/15
                  4 Gligoric,Svetozar ½ ½ ½ * 0 ½ ½ ½ 1 ½ ½ 1 1 1 1 1 10.0/15
                  5 Bronstein,David Ionovich ½ ½ 0 1 * ½ ½ ½ ½ ½ 1 ½ 1 ½ 1 1 9.5/15
                  6 Najdorf,Miguel 0 ½ 0 ½ ½ * ½ ½ 1 ½ ½ ½ 1 1 1 1 9.0/15
                  7 Keres,Paul 1 ½ ½ ½ ½ ½ * 1 0 ½ 0 ½ ½ ½ 1 1 8.5/15 61.25
                  8 Pachman,Ludek ½ ½ ½ ½ ½ ½ 0 * ½ ½ ½ ½ ½ 1 1 1 8.5/15 56.00
                  9 Unzicker,Wolfgang ½ ½ ½ 0 ½ 0 1 ½ * 1 ½ ½ ½ 1 0 1 8.0/15 56.25
                  10 Stahlberg,Anders Gideon Tom 0 0 ½ ½ ½ ½ ½ ½ 0 * ½ ½ 1 1 1 1 8.0/15 48.25
                  11 Szabo,Laszlo 0 0 ½ ½ 0 ½ 1 ½ ½ ½ * ½ ½ ½ 0 ½ 6.0/15
                  12 Padevsky,Nikola Bochev 0 0 0 0 ½ ½ ½ ½ ½ ½ ½ * 0 ½ 1 ½ 5.5/15 34.75
                  13 Uhlmann,Wolfgang 0 0 ½ 0 0 0 ½ ½ ½ 0 ½ 1 * 1 ½ ½ 5.5/15 32.50
                  14 Ciocaltea,Victor 0 0 0 0 ½ 0 ½ 0 0 0 ½ ½ 0 * 1 ½ 3.5/15
                  15 Sliwa,Bogdan 0 0 0 0 0 0 0 0 1 0 1 0 ½ 0 * ½ 3.0/15
                  16 Golombek,Harry 0 0 0 0 0 0 0 0 0 0 ½ ½ ½ ½ ½ * 2.5/15
                  

                  As for me, the number of characters right after the word Botvinnik till right before the word Golombek is exactly 975 chars. So :

                  • The regex (?s)Botvinnik.{975}Golombek does find the range of chars and both words

                  • The regex (?s)Botvinnik.{974}Golombek does not find anything as well as the regex (?s)Botvinnik.{976}Golombek

                  Like you, I was rather upset that the count operation would have concerned bytes and not chars :-((


                  Now, Erich, here is an improved regex to find each word, with their exact case, whatever their order :

                  SEARCH (?s-i)(?:(Name_1)|(Name_2)).{0,2000}?(?(1)(?2)|(?1))

                  For instance, with your example :

                  SEARCH (?s-i)(?:(Botvinnik)|(Golombek)).{0,2000}?(?(1)(?2)|(?1))

                  SEARCH (?s-i)(?:(Padevsky)|(Gligoric)).{0,2000}?(?(1)(?2)|(?1))


                  Here is a second regex to find each word, with their exact case, whatever the order too, but :

                  • A first click, on the Find Next button, finds the first word

                  • A second click, on the Find Next button, find the second word

                  SEARCH (?s-i).*?\K(?:(Name_1)|(Name_2))|.{0,2000}?\K(?(1)(?2)|(?1))

                  Always with your example :

                  SEARCH (?s-i).*?\K(?:(Botvinnik)|(Golombek))|.{0,2000}?\K(?(1)(?2)|(?1))

                  SEARCH (?s-i).*?\K(?:(Padevsky)|(Gligoric))|.{0,2000}?\K(?(1)(?2)|(?1))

                  Best Regards,

                  guy038

                  Alan KilbornA 2 Replies Last reply Reply Quote 3
                  • guy038G
                    guy038
                    last edited by guy038

                    Hi, @erich-siebenhaar, @ekopalypse, @alan-kilborn and All,

                    Sorry, I forgot to discuss your other case : find two words separated by, let’s say, not more than 50 lines

                    In that case, the first regex, matching the both words and the lines in between is :

                    SEARCH (?-si)(?:(Name_1)|(Name_2)).*\R(.*\R){0,50}.*(?(1)(?2)|(?1))


                    Test these two regexes, below, against your example :

                    SEARCH (?-si)(?:(Botvinnik)|(Golombek)).*\R(.*\R){0,50}.*(?(1)(?2)|(?1))

                    SEARCH (?-si)(?:(Padevsky)|(Gligoric)).*\R(.*\R){0,50}.*(?(1)(?2)|(?1))


                    Unfortunately, when dealing with lines rather than characters, I was unable to find out the second regex version, which would have searched the first word, then the second !


                    Note : Of course, if you do not mind about case, change any -i modifier when the i modifier, which leads to :

                    • (?si)..., in my previous post

                    • (?i-s)..., in this present post !

                    BR

                    guy038

                    1 Reply Last reply Reply Quote 3
                    • Alan KilbornA
                      Alan Kilborn @guy038
                      last edited by

                      @guy038 said in Parallel searching for 2 Names:

                      Alan, don’t worry ! the regex dot symbol ( . ) does count characters and not bytes ;-))

                      Not sure what I originally did when I experimented with the data.
                      I’m sure that file encoding was UTF-8.
                      But trying it again now (?s)Botvinnik.{0,1000}?Golombek definitely does work on the OP’s data, so…sorry for the noise.

                      1 Reply Last reply Reply Quote 2
                      • Alan KilbornA
                        Alan Kilborn @guy038
                        last edited by

                        @guy038 said in Parallel searching for 2 Names:

                        SEARCH (?s-i)(?:(Name_1)|(Name_2)).{0,2000}?(?(1)(?2)|(?1))

                        I think if you are making this into a generic formula, it should be:

                        (?s-i)(?:(Name_1)|(Name_2)).{0,Max_chars}?(?(1)(?2)|(?1))

                        1 Reply Last reply Reply Quote 3
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors