Help with the regexes

Giuseppe Pulitanò

Hi guys, I need a regex or any suggestion for this:

I must

select all the words that are contained between the terms (plural of and English) in all the file,
select the lines begenning with the same words
copy each line beside the correspondent (plural of…English)

Example:

finisher n.\n1 A person who finishes or completes something
finishers n.\n(plural of finisher English)
finishes n.\n(plural of finish English)
finishes off vb.\n(en-third-person singular of: finish off)
finishest vb.\n(en-archaic second-person singular of: finish)
finisheth vb.\n(en-archaic third-person singular of: finish)
finishing n.\n1 the act of completing something
finish.\nvb.\n(present participle of finish English)
finishing line n.\n(alternative form of finish line English)
finishing lines n.\n(plural of finishing line English)
finishing move n.\n(context video games English) In media such as…
finishing moves n.\n(plural of finishing move English)
finishing off vb.\n(present participle of finish off English)

Later the text must become:

finisher n.\n1 A person who finishes or completes something
finishers n.\n(plural of finisher English) n.\n1 A person who finishes or completes something
finishes n.\n(plural of finish English)
finishes off vb.\n(en-third-person singular of: finish off)
finishest vb.\n(en-archaic second-person singular of: finish)
finisheth vb.\n(en-archaic third-person singular of: finish)
finishing n.\n1 the act of completing something
finish.\nvb.\n(present participle of finish English)
finishing line n.\n(alternative form of finish line English)
finishing lines n.\n(plural of finishing line English) n.\n(alternative form of finish line English)
finishing move n.\n(context video games English) In media such as…
finishing moves n.\n(plural of finishing move English)n.\n(context video games English) In media such as…
finishing off vb.\n(present participle of finish off English)

Tom Saury

Hi,
The first is relatively easy :
plural of.*English

I will search for the rest a bit later.

guy038

@giuseppe-pulitanò, and All,

Before giving you a regex solution, I noticed a particularity, in your text. ! Examining some of your lines, for example :

finisher n.\n1
finishing line n.\n
finishing off vb.\n

We deduce that a word ( like finisher ) OR a group of words ( like finishing off ) are always followed with a space character … … except for the line, beginning with finish.\nvb.\n......, where a dot immediately follows the word finish !

Is this syntax common in your definitions ? Or that particular line should be written finish .\nvb.\n...... or even finish ???.\nvb.\n......, where ??? represents an abbreviation ?

See you later,

Best Regards,

guy038

Giuseppe Pulitanò

It is a TABfile dictionary; each entry is followed by a TAB caracter:

wordTABdefinition

So it is:

finisherTABn.\n1 A person who finishes or completes something
finishersTABn.\n(plural of finisher English)
finishesTABn.\n(plural of finish English)
finishes offTABvb.\n(en-third-person singular of: finish off)
finishestTABvb.\n(en-archaic second-person singular of: finish)
finishethTABvb.\n(en-archaic third-person singular of: finish)
finishingTABn.\n1 the act of completing something
finishTAB.\nvb.\n(present participle of finish English)
finishing lineTABn.\n(alternative form of finish line English)
finishing linesTABn.\n(plural of finishing line English)
finishing moveTABn.\n(context video games English) In media such as…
finishing movesTABn.\n(plural of finishing move English)
finishing offTABvb.\n(present participle of finish off English)

Later the text must become:

finisherTABn.\n1 A person who finishes or completes something
finishersTABn.\n(plural of finisher English) n.\n1 A person who finishes or completes something
finishesTABn.\n(plural of finish English)
finishes offTABvb.\n(en-third-person singular of: finish off)
finishestTABvb.\n(en-archaic second-person singular of: finish)
finishethTABvb.\n(en-archaic third-person singular of: finish)
finishingTABn.\n1 the act of completing something
finishTAB.\nvb.\n(present participle of finish English)
finishing lineTABn.\n(alternative form of finish line English)
finishing linesTABn.\n(plural of finishing line English) n.\n(alternative form of finish line English)
finishing moveTABn.\n(context video games English) In media such as…
finishing movesTABn.\n(plural of finishing move English)n.\n(context video games English) In media such as…
finishing offTABvb.\n(present participle of finish off English)

guy038

Hi, @giuseppe-pulitanò, and All,

Ah, perfect ! Even easier to create the correct regex as the \t tabulation char separates, without any ambiguity, each header word with its definition :-)). So :

Open the Replace dialog ( Ctrl + H )
SEARCH (?-is)^(.+)\t(.+)\R.+plural of\x20\1\x20English\)
REPLACE $0\x20\2
Select the Regular expression search mode
Tick, preferably, the Wrap around option
Click, once, on the Replace All button or several times on the Replace button

Et voilà !

Notes :

The search part tries to grab two lines where the header word of the first line, with its exact case, is embedded in the expression plural ...... English of the end of the second line, with its exact case, too.
At beginning, the part (?-is) means that :
- The search is carried on a non-insensitive way, (?-i)
- The regex engine considers dot as any single standard character only ( not an EOL one ), (?-s)
Then, the (.+)\t(.+)\R part catches the first line, with its line-break and stores, as groups1 and 2, text which is, either, before and after the tabulation separator \t
And the final part .+plural of\x20\1\x20English\) grabs all the second line contents with the condition that the header word, \1, must be located between the expressions plural of and English, with this exact case
In replacement, we, first, rewrite these two lines, untouched, $0, followed with a space char, \x20 and the definition part of the previous line, \2

Best Regards,

guy038

Giuseppe Pulitanò

@guy038 said:

Hi, @giuseppe-pulitanò, and All,

Ah, perfect ! Even easier to create the correct regex as the \t tabulation char separates, without any ambiguity, each header word with its definition :-)). So :

Open the Replace dialog ( Ctrl + H )

SEARCH (?-is)^(.+)\t(.+)\R.+plural of\x20\1\x20English\)

REPLACE $0\x20\2

Select the Regular expression search mode

Tick, preferably, the Wrap around option

Click, once, on the Replace All button or several times on the Replace button

Et voilà !

Notes :

The search part tries to grab two lines where the header word of the first line, with its exact case, is embedded in the expression plural ...... English of the end of the second line, with its exact case, too.

At beginning, the part (?-is) means that :

The search is carried on a non-insensitive way, (?-i)

The regex engine considers dot as any single standard character only ( not an EOL one ), (?-s)

Then, the (.+)\t(.+)\R part catches the first line, with its line-break and stores, as groups1 and 2, text which is, either, before and after the tabulation separator \t

And the final part .+plural of\x20\1\x20English\) grabs all the second line contents with the condition that the header word, \1, must be located between the expressions plural of and English, with this exact case

In replacement, we, first, rewrite these two lines, untouched, $0, followed with a space char, \x20 and the definition part of the previous line, \2

Best Regards,

guy038

I tried it but it doesn’t work…

PS: if you find the regex could you give me too the regex for 2 general terms ? For example instead of (plural of …English) , item1 and item2

guy038

Hello, @giuseppe-pulitanò, and All,

Before extending the regex to some general cases, it would be better to solve the present problem !

As for me, I re-verified my regex, against your sample text, and it’s working fine !

Note that regexes are extremely sensitive to real text ! I mean that a simple additional space, somewhere, may cause the regular expression to fail ! So, if your text is neither personal nor confidential, could you send me, by e-mail, part of your text, in order to do additional tests ?

Thanks

See you later,

guy038

Giuseppe Pulitanò

Hi guy038 I sent you the email with the file at the andress :

Many thanks