Navigation

    Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Help with the regexes

    Help wanted · · · – – – · · ·
    3
    8
    1423
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Giuseppe Pulitanò
      Giuseppe Pulitanò last edited by

      Hi guys, I need a regex or any suggestion for this:

      I must

      • select all the words that are contained between the terms (plural of and English) in all the file,
      • select the lines begenning with the same words
      • copy each line beside the correspondent (plural of…English)

      Example:

      finisher n.\n1 A person who finishes or completes something
      finishers n.\n(plural of finisher English)
      finishes n.\n(plural of finish English)
      finishes off vb.\n(en-third-person singular of: finish off)
      finishest vb.\n(en-archaic second-person singular of: finish)
      finisheth vb.\n(en-archaic third-person singular of: finish)
      finishing n.\n1 the act of completing something
      finish.\nvb.\n(present participle of finish English)
      finishing line n.\n(alternative form of finish line English)
      finishing lines n.\n(plural of finishing line English)
      finishing move n.\n(context video games English) In media such as…
      finishing moves n.\n(plural of finishing move English)
      finishing off vb.\n(present participle of finish off English)

      Later the text must become:

      finisher n.\n1 A person who finishes or completes something
      finishers n.\n(plural of finisher English) n.\n1 A person who finishes or completes something
      finishes n.\n(plural of finish English)
      finishes off vb.\n(en-third-person singular of: finish off)
      finishest vb.\n(en-archaic second-person singular of: finish)
      finisheth vb.\n(en-archaic third-person singular of: finish)
      finishing n.\n1 the act of completing something
      finish.\nvb.\n(present participle of finish English)
      finishing line n.\n(alternative form of finish line English)
      finishing lines n.\n(plural of finishing line English) n.\n(alternative form of finish line English)
      finishing move n.\n(context video games English) In media such as…
      finishing moves n.\n(plural of finishing move English)n.\n(context video games English) In media such as…
      finishing off vb.\n(present participle of finish off English)

      1 Reply Last reply Reply Quote 0
      • Tom Saury
        Tom Saury last edited by

        Hi,
        The first is relatively easy :
        plural of.*English

        I will search for the rest a bit later.

        1 Reply Last reply Reply Quote 0
        • guy038
          guy038 last edited by guy038

          @giuseppe-pulitanò, and All,

          Before giving you a regex solution, I noticed a particularity, in your text. ! Examining some of your lines, for example :

          finisher n.\n1
          finishing line n.\n
          finishing off vb.\n
          

          We deduce that a word ( like finisher ) OR a group of words ( like finishing off ) are always followed with a space character …  … except for the line, beginning with finish.\nvb.\n......, where a dot immediately follows the word finish !

          Is this syntax common in your definitions ? Or that particular line should be written finish .\nvb.\n...... or even finish ???.\nvb.\n......, where ??? represents an abbreviation ?

          See you later,

          Best Regards,

          guy038

          Giuseppe Pulitanò 1 Reply Last reply Reply Quote 1
          • Giuseppe Pulitanò
            Giuseppe Pulitanò @guy038 last edited by

            It is a TABfile dictionary; each entry is followed by a TAB caracter:

            wordTABdefinition

            So it is:

            finisherTABn.\n1 A person who finishes or completes something
            finishersTABn.\n(plural of finisher English)
            finishesTABn.\n(plural of finish English)
            finishes offTABvb.\n(en-third-person singular of: finish off)
            finishestTABvb.\n(en-archaic second-person singular of: finish)
            finishethTABvb.\n(en-archaic third-person singular of: finish)
            finishingTABn.\n1 the act of completing something
            finishTAB.\nvb.\n(present participle of finish English)
            finishing lineTABn.\n(alternative form of finish line English)
            finishing linesTABn.\n(plural of finishing line English)
            finishing moveTABn.\n(context video games English) In media such as…
            finishing movesTABn.\n(plural of finishing move English)
            finishing offTABvb.\n(present participle of finish off English)

            Later the text must become:

            finisherTABn.\n1 A person who finishes or completes something
            finishersTABn.\n(plural of finisher English) n.\n1 A person who finishes or completes something
            finishesTABn.\n(plural of finish English)
            finishes offTABvb.\n(en-third-person singular of: finish off)
            finishestTABvb.\n(en-archaic second-person singular of: finish)
            finishethTABvb.\n(en-archaic third-person singular of: finish)
            finishingTABn.\n1 the act of completing something
            finishTAB.\nvb.\n(present participle of finish English)
            finishing lineTABn.\n(alternative form of finish line English)
            finishing linesTABn.\n(plural of finishing line English) n.\n(alternative form of finish line English)
            finishing moveTABn.\n(context video games English) In media such as…
            finishing movesTABn.\n(plural of finishing move English)n.\n(context video games English) In media such as…
            finishing offTABvb.\n(present participle of finish off English)

            1 Reply Last reply Reply Quote 0
            • guy038
              guy038 last edited by guy038

              Hi, @giuseppe-pulitanò, and All,

              Ah, perfect ! Even easier to create the correct regex as the \t tabulation char separates, without any ambiguity, each header word with its definition :-)). So :

              • Open the Replace dialog ( Ctrl + H )

              • SEARCH (?-is)^(.+)\t(.+)\R.+plural of\x20\1\x20English\)

              • REPLACE $0\x20\2

              • Select the Regular expression search mode

              • Tick, preferably, the Wrap around option

              • Click, once, on the Replace All button or several times on the Replace button

              Et voilà !

              Notes :

              • The search part tries to grab two lines where the header word of the first line, with its exact case, is embedded in the expression plural ...... English of the end of the second line, with its exact case, too.

              • At beginning, the part (?-is) means that :

                • The search is carried on a non-insensitive way, (?-i)

                • The regex engine considers dot as any single standard character only ( not an EOL one ), (?-s)

              • Then, the (.+)\t(.+)\R part catches the first line, with its line-break and stores, as groups1 and 2, text which is, either, before and after the tabulation separator \t

              • And the final part .+plural of\x20\1\x20English\) grabs all the second line contents with the condition that the header word, \1, must be located between the expressions plural of and English, with this exact case

              • In replacement, we, first, rewrite these two lines, untouched, $0, followed with a space char, \x20 and the definition part of the previous line, \2

              Best Regards,

              guy038

              Giuseppe Pulitanò 1 Reply Last reply Reply Quote 1
              • Giuseppe Pulitanò
                Giuseppe Pulitanò @guy038 last edited by

                @guy038 said:

                Hi, @giuseppe-pulitanò, and All,

                Ah, perfect ! Even easier to create the correct regex as the \t tabulation char separates, without any ambiguity, each header word with its definition :-)). So :

                • Open the Replace dialog ( Ctrl + H )

                • SEARCH (?-is)^(.+)\t(.+)\R.+plural of\x20\1\x20English\)

                • REPLACE $0\x20\2

                • Select the Regular expression search mode

                • Tick, preferably, the Wrap around option

                • Click, once, on the Replace All button or several times on the Replace button

                Et voilà !

                Notes :

                • The search part tries to grab two lines where the header word of the first line, with its exact case, is embedded in the expression plural ...... English of the end of the second line, with its exact case, too.

                • At beginning, the part (?-is) means that :

                  • The search is carried on a non-insensitive way, (?-i)

                  • The regex engine considers dot as any single standard character only ( not an EOL one ), (?-s)

                • Then, the (.+)\t(.+)\R part catches the first line, with its line-break and stores, as groups1 and 2, text which is, either, before and after the tabulation separator \t

                • And the final part .+plural of\x20\1\x20English\) grabs all the second line contents with the condition that the header word, \1, must be located between the expressions plural of and English, with this exact case

                • In replacement, we, first, rewrite these two lines, untouched, $0, followed with a space char, \x20 and the definition part of the previous line, \2

                Best Regards,

                guy038

                I tried it but it doesn’t work…

                PS: if you find the regex could you give me too the regex for 2 general terms ? For example instead of (plural of …English) , item1 and item2

                1 Reply Last reply Reply Quote 0
                • guy038
                  guy038 last edited by guy038

                  Hello, @giuseppe-pulitanò, and All,

                  Before extending the regex to some general cases, it would be better to solve the present problem !

                  As for me, I re-verified my regex, against your sample text, and it’s working fine !

                  Note that regexes are extremely sensitive to real text ! I mean that a simple additional space, somewhere, may cause the regular expression to fail ! So, if your text is neither personal nor confidential, could you send me, by e-mail, part of your text, in order to do additional tests ?

                  Thanks

                  See you later,

                  guy038

                  1 Reply Last reply Reply Quote 1
                  • Giuseppe Pulitanò
                    Giuseppe Pulitanò last edited by guy038

                    Hi guy038 I sent you the email with the file at the andress :

                    Many thanks

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post
                    Copyright © 2014 NodeBB Forums | Contributors