Community
    • Login

    Help with the regexes

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    8 Posts 3 Posters 1.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Giuseppe PulitanòG
      Giuseppe Pulitanò
      last edited by

      Hi guys, I need a regex or any suggestion for this:

      I must

      • select all the words that are contained between the terms (plural of and English) in all the file,
      • select the lines begenning with the same words
      • copy each line beside the correspondent (plural of…English)

      Example:

      finisher n.\n1 A person who finishes or completes something
      finishers n.\n(plural of finisher English)
      finishes n.\n(plural of finish English)
      finishes off vb.\n(en-third-person singular of: finish off)
      finishest vb.\n(en-archaic second-person singular of: finish)
      finisheth vb.\n(en-archaic third-person singular of: finish)
      finishing n.\n1 the act of completing something
      finish.\nvb.\n(present participle of finish English)
      finishing line n.\n(alternative form of finish line English)
      finishing lines n.\n(plural of finishing line English)
      finishing move n.\n(context video games English) In media such as…
      finishing moves n.\n(plural of finishing move English)
      finishing off vb.\n(present participle of finish off English)

      Later the text must become:

      finisher n.\n1 A person who finishes or completes something
      finishers n.\n(plural of finisher English) n.\n1 A person who finishes or completes something
      finishes n.\n(plural of finish English)
      finishes off vb.\n(en-third-person singular of: finish off)
      finishest vb.\n(en-archaic second-person singular of: finish)
      finisheth vb.\n(en-archaic third-person singular of: finish)
      finishing n.\n1 the act of completing something
      finish.\nvb.\n(present participle of finish English)
      finishing line n.\n(alternative form of finish line English)
      finishing lines n.\n(plural of finishing line English) n.\n(alternative form of finish line English)
      finishing move n.\n(context video games English) In media such as…
      finishing moves n.\n(plural of finishing move English)n.\n(context video games English) In media such as…
      finishing off vb.\n(present participle of finish off English)

      1 Reply Last reply Reply Quote 0
      • Tom SauryT
        Tom Saury
        last edited by

        Hi,
        The first is relatively easy :
        plural of.*English

        I will search for the rest a bit later.

        1 Reply Last reply Reply Quote 0
        • guy038G
          guy038
          last edited by guy038

          @giuseppe-pulitanò, and All,

          Before giving you a regex solution, I noticed a particularity, in your text. ! Examining some of your lines, for example :

          finisher n.\n1
          finishing line n.\n
          finishing off vb.\n
          

          We deduce that a word ( like finisher ) OR a group of words ( like finishing off ) are always followed with a space character …  … except for the line, beginning with finish.\nvb.\n......, where a dot immediately follows the word finish !

          Is this syntax common in your definitions ? Or that particular line should be written finish .\nvb.\n...... or even finish ???.\nvb.\n......, where ??? represents an abbreviation ?

          See you later,

          Best Regards,

          guy038

          Giuseppe PulitanòG 1 Reply Last reply Reply Quote 1
          • Giuseppe PulitanòG
            Giuseppe Pulitanò @guy038
            last edited by

            It is a TABfile dictionary; each entry is followed by a TAB caracter:

            wordTABdefinition

            So it is:

            finisherTABn.\n1 A person who finishes or completes something
            finishersTABn.\n(plural of finisher English)
            finishesTABn.\n(plural of finish English)
            finishes offTABvb.\n(en-third-person singular of: finish off)
            finishestTABvb.\n(en-archaic second-person singular of: finish)
            finishethTABvb.\n(en-archaic third-person singular of: finish)
            finishingTABn.\n1 the act of completing something
            finishTAB.\nvb.\n(present participle of finish English)
            finishing lineTABn.\n(alternative form of finish line English)
            finishing linesTABn.\n(plural of finishing line English)
            finishing moveTABn.\n(context video games English) In media such as…
            finishing movesTABn.\n(plural of finishing move English)
            finishing offTABvb.\n(present participle of finish off English)

            Later the text must become:

            finisherTABn.\n1 A person who finishes or completes something
            finishersTABn.\n(plural of finisher English) n.\n1 A person who finishes or completes something
            finishesTABn.\n(plural of finish English)
            finishes offTABvb.\n(en-third-person singular of: finish off)
            finishestTABvb.\n(en-archaic second-person singular of: finish)
            finishethTABvb.\n(en-archaic third-person singular of: finish)
            finishingTABn.\n1 the act of completing something
            finishTAB.\nvb.\n(present participle of finish English)
            finishing lineTABn.\n(alternative form of finish line English)
            finishing linesTABn.\n(plural of finishing line English) n.\n(alternative form of finish line English)
            finishing moveTABn.\n(context video games English) In media such as…
            finishing movesTABn.\n(plural of finishing move English)n.\n(context video games English) In media such as…
            finishing offTABvb.\n(present participle of finish off English)

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hi, @giuseppe-pulitanò, and All,

              Ah, perfect ! Even easier to create the correct regex as the \t tabulation char separates, without any ambiguity, each header word with its definition :-)). So :

              • Open the Replace dialog ( Ctrl + H )

              • SEARCH (?-is)^(.+)\t(.+)\R.+plural of\x20\1\x20English\)

              • REPLACE $0\x20\2

              • Select the Regular expression search mode

              • Tick, preferably, the Wrap around option

              • Click, once, on the Replace All button or several times on the Replace button

              Et voilà !

              Notes :

              • The search part tries to grab two lines where the header word of the first line, with its exact case, is embedded in the expression plural ...... English of the end of the second line, with its exact case, too.

              • At beginning, the part (?-is) means that :

                • The search is carried on a non-insensitive way, (?-i)

                • The regex engine considers dot as any single standard character only ( not an EOL one ), (?-s)

              • Then, the (.+)\t(.+)\R part catches the first line, with its line-break and stores, as groups1 and 2, text which is, either, before and after the tabulation separator \t

              • And the final part .+plural of\x20\1\x20English\) grabs all the second line contents with the condition that the header word, \1, must be located between the expressions plural of and English, with this exact case

              • In replacement, we, first, rewrite these two lines, untouched, $0, followed with a space char, \x20 and the definition part of the previous line, \2

              Best Regards,

              guy038

              Giuseppe PulitanòG 1 Reply Last reply Reply Quote 1
              • Giuseppe PulitanòG
                Giuseppe Pulitanò @guy038
                last edited by

                @guy038 said:

                Hi, @giuseppe-pulitanò, and All,

                Ah, perfect ! Even easier to create the correct regex as the \t tabulation char separates, without any ambiguity, each header word with its definition :-)). So :

                • Open the Replace dialog ( Ctrl + H )

                • SEARCH (?-is)^(.+)\t(.+)\R.+plural of\x20\1\x20English\)

                • REPLACE $0\x20\2

                • Select the Regular expression search mode

                • Tick, preferably, the Wrap around option

                • Click, once, on the Replace All button or several times on the Replace button

                Et voilà !

                Notes :

                • The search part tries to grab two lines where the header word of the first line, with its exact case, is embedded in the expression plural ...... English of the end of the second line, with its exact case, too.

                • At beginning, the part (?-is) means that :

                  • The search is carried on a non-insensitive way, (?-i)

                  • The regex engine considers dot as any single standard character only ( not an EOL one ), (?-s)

                • Then, the (.+)\t(.+)\R part catches the first line, with its line-break and stores, as groups1 and 2, text which is, either, before and after the tabulation separator \t

                • And the final part .+plural of\x20\1\x20English\) grabs all the second line contents with the condition that the header word, \1, must be located between the expressions plural of and English, with this exact case

                • In replacement, we, first, rewrite these two lines, untouched, $0, followed with a space char, \x20 and the definition part of the previous line, \2

                Best Regards,

                guy038

                I tried it but it doesn’t work…

                PS: if you find the regex could you give me too the regex for 2 general terms ? For example instead of (plural of …English) , item1 and item2

                1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by guy038

                  Hello, @giuseppe-pulitanò, and All,

                  Before extending the regex to some general cases, it would be better to solve the present problem !

                  As for me, I re-verified my regex, against your sample text, and it’s working fine !

                  Note that regexes are extremely sensitive to real text ! I mean that a simple additional space, somewhere, may cause the regular expression to fail ! So, if your text is neither personal nor confidential, could you send me, by e-mail, part of your text, in order to do additional tests ?

                  Thanks

                  See you later,

                  guy038

                  1 Reply Last reply Reply Quote 1
                  • Giuseppe PulitanòG
                    Giuseppe Pulitanò
                    last edited by guy038

                    Hi guy038 I sent you the email with the file at the andress :

                    Many thanks

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors