Help with the regexes



  • Hi guys, I need a regex or any suggestion for this:

    I must

    • select all the words that are contained between the terms (plural of and English) in all the file,
    • select the lines begenning with the same words
    • copy each line beside the correspondent (plural of…English)

    Example:

    finisher n.\n1 A person who finishes or completes something
    finishers n.\n(plural of finisher English)
    finishes n.\n(plural of finish English)
    finishes off vb.\n(en-third-person singular of: finish off)
    finishest vb.\n(en-archaic second-person singular of: finish)
    finisheth vb.\n(en-archaic third-person singular of: finish)
    finishing n.\n1 the act of completing something
    finish.\nvb.\n(present participle of finish English)
    finishing line n.\n(alternative form of finish line English)
    finishing lines n.\n(plural of finishing line English)
    finishing move n.\n(context video games English) In media such as…
    finishing moves n.\n(plural of finishing move English)
    finishing off vb.\n(present participle of finish off English)

    Later the text must become:

    finisher n.\n1 A person who finishes or completes something
    finishers n.\n(plural of finisher English) n.\n1 A person who finishes or completes something
    finishes n.\n(plural of finish English)
    finishes off vb.\n(en-third-person singular of: finish off)
    finishest vb.\n(en-archaic second-person singular of: finish)
    finisheth vb.\n(en-archaic third-person singular of: finish)
    finishing n.\n1 the act of completing something
    finish.\nvb.\n(present participle of finish English)
    finishing line n.\n(alternative form of finish line English)
    finishing lines n.\n(plural of finishing line English) n.\n(alternative form of finish line English)
    finishing move n.\n(context video games English) In media such as…
    finishing moves n.\n(plural of finishing move English)n.\n(context video games English) In media such as…
    finishing off vb.\n(present participle of finish off English)



  • Hi,
    The first is relatively easy :
    plural of.*English

    I will search for the rest a bit later.



  • @giuseppe-pulitanò, and All,

    Before giving you a regex solution, I noticed a particularity, in your text. ! Examining some of your lines, for example :

    finisher n.\n1
    finishing line n.\n
    finishing off vb.\n
    

    We deduce that a word ( like finisher ) OR a group of words ( like finishing off ) are always followed with a space character …  … except for the line, beginning with finish.\nvb.\n......, where a dot immediately follows the word finish !

    Is this syntax common in your definitions ? Or that particular line should be written finish .\nvb.\n...... or even finish ???.\nvb.\n......, where ??? represents an abbreviation ?

    See you later,

    Best Regards,

    guy038



  • It is a TABfile dictionary; each entry is followed by a TAB caracter:

    wordTABdefinition

    So it is:

    finisherTABn.\n1 A person who finishes or completes something
    finishersTABn.\n(plural of finisher English)
    finishesTABn.\n(plural of finish English)
    finishes offTABvb.\n(en-third-person singular of: finish off)
    finishestTABvb.\n(en-archaic second-person singular of: finish)
    finishethTABvb.\n(en-archaic third-person singular of: finish)
    finishingTABn.\n1 the act of completing something
    finishTAB.\nvb.\n(present participle of finish English)
    finishing lineTABn.\n(alternative form of finish line English)
    finishing linesTABn.\n(plural of finishing line English)
    finishing moveTABn.\n(context video games English) In media such as…
    finishing movesTABn.\n(plural of finishing move English)
    finishing offTABvb.\n(present participle of finish off English)

    Later the text must become:

    finisherTABn.\n1 A person who finishes or completes something
    finishersTABn.\n(plural of finisher English) n.\n1 A person who finishes or completes something
    finishesTABn.\n(plural of finish English)
    finishes offTABvb.\n(en-third-person singular of: finish off)
    finishestTABvb.\n(en-archaic second-person singular of: finish)
    finishethTABvb.\n(en-archaic third-person singular of: finish)
    finishingTABn.\n1 the act of completing something
    finishTAB.\nvb.\n(present participle of finish English)
    finishing lineTABn.\n(alternative form of finish line English)
    finishing linesTABn.\n(plural of finishing line English) n.\n(alternative form of finish line English)
    finishing moveTABn.\n(context video games English) In media such as…
    finishing movesTABn.\n(plural of finishing move English)n.\n(context video games English) In media such as…
    finishing offTABvb.\n(present participle of finish off English)



  • Hi, @giuseppe-pulitanò, and All,

    Ah, perfect ! Even easier to create the correct regex as the \t tabulation char separates, without any ambiguity, each header word with its definition :-)). So :

    • Open the Replace dialog ( Ctrl + H )

    • SEARCH (?-is)^(.+)\t(.+)\R.+plural of\x20\1\x20English\)

    • REPLACE $0\x20\2

    • Select the Regular expression search mode

    • Tick, preferably, the Wrap around option

    • Click, once, on the Replace All button or several times on the Replace button

    Et voilà !

    Notes :

    • The search part tries to grab two lines where the header word of the first line, with its exact case, is embedded in the expression plural ...... English of the end of the second line, with its exact case, too.

    • At beginning, the part (?-is) means that :

      • The search is carried on a non-insensitive way, (?-i)

      • The regex engine considers dot as any single standard character only ( not an EOL one ), (?-s)

    • Then, the (.+)\t(.+)\R part catches the first line, with its line-break and stores, as groups1 and 2, text which is, either, before and after the tabulation separator \t

    • And the final part .+plural of\x20\1\x20English\) grabs all the second line contents with the condition that the header word, \1, must be located between the expressions plural of and English, with this exact case

    • In replacement, we, first, rewrite these two lines, untouched, $0, followed with a space char, \x20 and the definition part of the previous line, \2

    Best Regards,

    guy038



  • @guy038 said:

    Hi, @giuseppe-pulitanò, and All,

    Ah, perfect ! Even easier to create the correct regex as the \t tabulation char separates, without any ambiguity, each header word with its definition :-)). So :

    • Open the Replace dialog ( Ctrl + H )

    • SEARCH (?-is)^(.+)\t(.+)\R.+plural of\x20\1\x20English\)

    • REPLACE $0\x20\2

    • Select the Regular expression search mode

    • Tick, preferably, the Wrap around option

    • Click, once, on the Replace All button or several times on the Replace button

    Et voilà !

    Notes :

    • The search part tries to grab two lines where the header word of the first line, with its exact case, is embedded in the expression plural ...... English of the end of the second line, with its exact case, too.

    • At beginning, the part (?-is) means that :

      • The search is carried on a non-insensitive way, (?-i)

      • The regex engine considers dot as any single standard character only ( not an EOL one ), (?-s)

    • Then, the (.+)\t(.+)\R part catches the first line, with its line-break and stores, as groups1 and 2, text which is, either, before and after the tabulation separator \t

    • And the final part .+plural of\x20\1\x20English\) grabs all the second line contents with the condition that the header word, \1, must be located between the expressions plural of and English, with this exact case

    • In replacement, we, first, rewrite these two lines, untouched, $0, followed with a space char, \x20 and the definition part of the previous line, \2

    Best Regards,

    guy038

    I tried it but it doesn’t work…

    PS: if you find the regex could you give me too the regex for 2 general terms ? For example instead of (plural of …English) , item1 and item2



  • Hello, @giuseppe-pulitanò, and All,

    Before extending the regex to some general cases, it would be better to solve the present problem !

    As for me, I re-verified my regex, against your sample text, and it’s working fine !

    Note that regexes are extremely sensitive to real text ! I mean that a simple additional space, somewhere, may cause the regular expression to fail ! So, if your text is neither personal nor confidential, could you send me, by e-mail, part of your text, in order to do additional tests ?

    tguy.038@gmail.com

    Thanks

    See you later,

    guy038



  • Hi guy038 I sent you the email with the file at the andress tguy.038@gmail.com

    Many thanks


Log in to reply