Parallel searching for 2 Names
-
This is something that can normally be achieved with regular expressions (regex).
Something likename_one.{1000}name_two
means thatn search for name_one followed by 1000 characters and then name_two must appear.
But regex can get really complicated.
You can find more information here. -
Building on what @Ekopalypse said…
I’d think
(?s)regex1.{0,1000}?regex2
meets the need (let us start with the “within 1000 characters” spec).Can you show us how it doesn’t meet the need?
Sample data is certainly welcome, and probably helps. -
search for 2 Names (or expressions) at the same time
These are all different things one could try to match (limited by the range):
- either John or Mary
- <John><arbitrary text><Mary>
- either <John><arbitrary text><Mary> or <Mary><arbitrary text><John>
- for <John><text><Mary>, if we encounter <John><text><John><text><Mary>, the whole match could start from the first or the last occurrence of <John>
Each would require its own expression.
-
Don’t give the OP ideas about how to upscope his need! :-)
-
@alan-kilborn
Thank you!!
This does in fact solve the problem.
First I thought it does not, because in this text:1 Botvinnik,Mikhail Moiseevich * ½ ½ ½ ½ 1 0 ½ ½ 1 1 1 1 1 1 1 11.0/15 71.75
2 Smyslov,Vassily Vasilievich ½ * ½ ½ ½ ½ ½ ½ ½ 1 1 1 1 1 1 1 11.0/15 71.50
3 Taimanov,Mark Evgenievich ½ ½ * ½ 1 1 ½ ½ ½ ½ ½ 1 ½ 1 1 1 10.5/15
4 Gligoric,Svetozar ½ ½ ½ * 0 ½ ½ ½ 1 ½ ½ 1 1 1 1 1 10.0/15
5 Bronstein,David Ionovich ½ ½ 0 1 * ½ ½ ½ ½ ½ 1 ½ 1 ½ 1 1 9.5/15
6 Najdorf,Miguel 0 ½ 0 ½ ½ * ½ ½ 1 ½ ½ ½ 1 1 1 1 9.0/15
7 Keres,Paul 1 ½ ½ ½ ½ ½ * 1 0 ½ 0 ½ ½ ½ 1 1 8.5/15 61.25
8 Pachman,Ludek ½ ½ ½ ½ ½ ½ 0 * ½ ½ ½ ½ ½ 1 1 1 8.5/15 56.00
9 Unzicker,Wolfgang ½ ½ ½ 0 ½ 0 1 ½ * 1 ½ ½ ½ 1 0 1 8.0/15 56.25
10 Stahlberg,Anders Gideon Tom 0 0 ½ ½ ½ ½ ½ ½ 0 * ½ ½ 1 1 1 1 8.0/15 48.25
11 Szabo,Laszlo 0 0 ½ ½ 0 ½ 1 ½ ½ ½ * ½ ½ ½ 0 ½ 6.0/15
12 Padevsky,Nikola Bochev 0 0 0 0 ½ ½ ½ ½ ½ ½ ½ * 0 ½ 1 ½ 5.5/15 34.75
13 Uhlmann,Wolfgang 0 0 ½ 0 0 0 ½ ½ ½ 0 ½ 1 * 1 ½ ½ 5.5/15 32.50
14 Ciocaltea,Victor 0 0 0 0 ½ 0 ½ 0 0 0 ½ ½ 0 * 1 ½ 3.5/15
15 Sliwa,Bogdan 0 0 0 0 0 0 0 0 1 0 1 0 ½ 0 * ½ 3.0/15
16 Golombek,Harry 0 0 0 0 0 0 0 0 0 0 ½ ½ ½ ½ ½ * 2.5/15(?s)Botvinnik.{0,1000}?Golombek
does not find the two players, but
(?s)Botvinnik.{0,1000}?Uhlmann works.
Why do i need (?s)Botvinnik.{0,1800}?Golombek to find the expressions, even though the lines are less than 80 characters long?
Anyway, I will search for 2000 and find everything I need.
-
@erich-siebenhaar said in Parallel searching for 2 Names:
Why do i need (?s)Botvinnik.{0,1800}?Golombek to find the expressions, even though the lines are less than 80 characters long?
I your example text, I see that the end of
Botvinnik
and the start ofGolombek
are 1092 positions apart. Thus using 1000 instead of 1800 isn’t going to find it.Interestingly, however is your use of the UTF-8 multibyte character
½
. This character is encoded into 2 bytes each time it occurs.If I replace
½
with a single-byte character, e.g.1
, and repeat the search using1000
, it succeeds in finding the match, because now the position difference between the two words are less than 1000.Thus it appears that the regex count qualifiers are unaware of multibyte character encoding. :-( I don’t like this… something like
.{1000}
should match 1000 characters, not 1000 bytes. @guy038 , do you have some comment on this? -
Hello, @erich-siebenhaar, @ekopalypse, @alan-kilborn and All,
Alan, don’t worry ! the regex dot symbol (
.
) does count characters and not bytes ;-))Don’t know which was your current encoding when you tested or it could be a wrong selection !
I will consider the text :
1 Botvinnik,Mikhail Moiseevich * ½ ½ ½ ½ 1 0 ½ ½ 1 1 1 1 1 1 1 11.0/15 71.75 2 Smyslov,Vassily Vasilievich ½ * ½ ½ ½ ½ ½ ½ ½ 1 1 1 1 1 1 1 11.0/15 71.50 3 Taimanov,Mark Evgenievich ½ ½ * ½ 1 1 ½ ½ ½ ½ ½ 1 ½ 1 1 1 10.5/15 4 Gligoric,Svetozar ½ ½ ½ * 0 ½ ½ ½ 1 ½ ½ 1 1 1 1 1 10.0/15 5 Bronstein,David Ionovich ½ ½ 0 1 * ½ ½ ½ ½ ½ 1 ½ 1 ½ 1 1 9.5/15 6 Najdorf,Miguel 0 ½ 0 ½ ½ * ½ ½ 1 ½ ½ ½ 1 1 1 1 9.0/15 7 Keres,Paul 1 ½ ½ ½ ½ ½ * 1 0 ½ 0 ½ ½ ½ 1 1 8.5/15 61.25 8 Pachman,Ludek ½ ½ ½ ½ ½ ½ 0 * ½ ½ ½ ½ ½ 1 1 1 8.5/15 56.00 9 Unzicker,Wolfgang ½ ½ ½ 0 ½ 0 1 ½ * 1 ½ ½ ½ 1 0 1 8.0/15 56.25 10 Stahlberg,Anders Gideon Tom 0 0 ½ ½ ½ ½ ½ ½ 0 * ½ ½ 1 1 1 1 8.0/15 48.25 11 Szabo,Laszlo 0 0 ½ ½ 0 ½ 1 ½ ½ ½ * ½ ½ ½ 0 ½ 6.0/15 12 Padevsky,Nikola Bochev 0 0 0 0 ½ ½ ½ ½ ½ ½ ½ * 0 ½ 1 ½ 5.5/15 34.75 13 Uhlmann,Wolfgang 0 0 ½ 0 0 0 ½ ½ ½ 0 ½ 1 * 1 ½ ½ 5.5/15 32.50 14 Ciocaltea,Victor 0 0 0 0 ½ 0 ½ 0 0 0 ½ ½ 0 * 1 ½ 3.5/15 15 Sliwa,Bogdan 0 0 0 0 0 0 0 0 1 0 1 0 ½ 0 * ½ 3.0/15 16 Golombek,Harry 0 0 0 0 0 0 0 0 0 0 ½ ½ ½ ½ ½ * 2.5/15
As for me, the number of characters right after the word
Botvinnik
till right before the wordGolombek
is exactly975
chars. So :-
The regex
(?s)Botvinnik.{975}Golombek
does find the range of chars and both words -
The regex
(?s)Botvinnik.{974}Golombek
does not find anything as well as the regex(?s)Botvinnik.{976}Golombek
Like you, I was rather upset that the count operation would have concerned bytes and not chars :-((
Now, Erich, here is an improved regex to find each word, with their exact case, whatever their order :
SEARCH
(?s-i)(?:(
Name_1)|(
Name_2)).{0,2000}?(?(1)(?2)|(?1))
For instance, with your example :
SEARCH
(?s-i)(?:(Botvinnik)|(Golombek)).{0,2000}?(?(1)(?2)|(?1))
SEARCH
(?s-i)(?:(Padevsky)|(Gligoric)).{0,2000}?(?(1)(?2)|(?1))
Here is a second regex to find each word, with their exact case, whatever the order too, but :
-
A first click, on the
Find Next
button, finds the first word -
A second click, on the
Find Next
button, find the second word
SEARCH
(?s-i).*?\K(?:(
Name_1)|(
Name_2))|.{0,2000}?\K(?(1)(?2)|(?1))
Always with your example :
SEARCH
(?s-i).*?\K(?:(Botvinnik)|(Golombek))|.{0,2000}?\K(?(1)(?2)|(?1))
SEARCH
(?s-i).*?\K(?:(Padevsky)|(Gligoric))|.{0,2000}?\K(?(1)(?2)|(?1))
Best Regards,
guy038
-
-
Hi, @erich-siebenhaar, @ekopalypse, @alan-kilborn and All,
Sorry, I forgot to discuss your other case : find two words separated by, let’s say, not more than
50
linesIn that case, the first regex, matching the both words and the lines in between is :
SEARCH
(?-si)(?:(
Name_1)|(
Name_2)).*\R(.*\R){0,50}.*(?(1)(?2)|(?1))
Test these two regexes, below, against your example :
SEARCH
(?-si)(?:(Botvinnik)|(Golombek)).*\R(.*\R){0,50}.*(?(1)(?2)|(?1))
SEARCH
(?-si)(?:(Padevsky)|(Gligoric)).*\R(.*\R){0,50}.*(?(1)(?2)|(?1))
Unfortunately, when dealing with lines rather than characters, I was unable to find out the second regex version, which would have searched the first word, then the second !
Note : Of course, if you do not mind about case, change any
-i
modifier when thei
modifier, which leads to :-
(?si)...
, in my previous post -
(?i-s)...
, in this present post !
BR
guy038
-
-
@guy038 said in Parallel searching for 2 Names:
Alan, don’t worry ! the regex dot symbol ( . ) does count characters and not bytes ;-))
Not sure what I originally did when I experimented with the data.
I’m sure that file encoding was UTF-8.
But trying it again now(?s)Botvinnik.{0,1000}?Golombek
definitely does work on the OP’s data, so…sorry for the noise. -
@guy038 said in Parallel searching for 2 Names:
SEARCH (?s-i)(?:(Name_1)|(Name_2)).{0,2000}?(?(1)(?2)|(?1))
I think if you are making this into a generic formula, it should be:
(?s-i)(?:(
Name_1)|(
Name_2)).{0,
Max_chars}?(?(1)(?2)|(?1))