Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Regex for dictionary entries

    General Discussion
    2
    10
    932
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • glossar
      glossar last edited by

      Hi
      I’m looking for a regex that would take the main entry at the beginning of the line and replace it with the tilde character following any numeral and finally put it in a separate line, together with the definition(s), and another one that would delete all the usage examples and their translations, set in either in bold or italic, while keeping the rest intact.
      Hence:
      bankrupt [tab] 1. blah blah 2. blah blah 3. blah blah
      bankruptly [tab] müflisane, iflas ederek/etmişçesine
      (Assuming each main entry is followed by a tab)

      Could someone provide me with one?
      Many thanks in advance!

      img

      img

      1 Reply Last reply Reply Quote 0
      • glossar
        glossar last edited by

        I can’t edit my previous post. I meant “headword” instead of “main entry”.

        1 Reply Last reply Reply Quote 0
        • guy038
          guy038 last edited by guy038

          Hello, @glossar,

          Regular expressions are very powerful, indeed ! But, unfortunately, cannot detect bold and italic variations of a font :-(( So, we’ll have to find out common boundaries for theses zones !

          Now, as I cannot really exploit your two pictures ( not true text ! ), I advice you to post an example of your initial text and the resulting text that you expect to, after one or several consecutive regex S/R !

          Simply, use this syntax, to get raw text, not processed in any way :

          ~~~diff
          From the INITIAL text :

          Your
          …
          text
          …
          here

          I would like this EXPECTED text :

          Your
          …
          changed
          …
          text
          ~~~

          which will be displayed as :

          From the INITIAL text :
          
          Your
          ...
          text
          ...
          here
          
          I would like this EXPECTED text :
          
          Your
          ...
          changed
          ...
          text
          

          If you prefer, just send me some of your examples, by e-mail, to :

          But, please, associate each exact AFTER text, that you want, to its exact BEFORE text, that you get ;-))

          Remember that sometimes, an additional or a missing single character may cause regular expressions to fail !

          See you later,

          Best regards,

          guy038

          1 Reply Last reply Reply Quote 3
          • glossar
            glossar last edited by

            Hi guy,
            Thank you for reply. Below I’ve posted the texts the way you told. It is my bad that I didn’t post a workable text, but screenshots for visual convenience. The original file is a pdf, scanned from a hard copy of a bilingual dictionary, which is further converted to a Word file, which is further saved as a html/txt file for processing. There are still repeated patterns to a usable degree in the converted Word file and the html/txt one, while the said conversation introduced falsely OCR-ed characters and distorted the format a bit. I’m aware that formatting gets lost in plain text and that regex has nothing to do with formatting. In my previous posts I forgot to mention that the first screenshot was taken from the word life and I would welcome any (sort of) regex that I could also implement within Word, in combination with format selection, hence my mentioning the formatting.

            BTW, the numbers for suffixes (-ly, -ness, -ship, etc.) before the tilde character is either one or two digits.

            From the INITIAL text:
            
            bankrupt [tab] 1. huk. batkın, müflis, batmış, iflâs etmiş, borçlarını ödeyemeyen kimse, to go ~ : batmak, iflâs etmek, to be ~ : yoksun olmak. He seems to be ~ of all kind feelings : Her türlü asil duygulardan yoksun görünüyor. fraudulent/negli-gent - : kötü niyetli batkın, hileli müflis, 2. meteliksiz, 3. yoksun, mahrum, düşkün, an ~ intellectual: fikir yoksunu. ~ of intelligence : akılsız, a moral ~ : ahlâk düşkünü, to be - of manners : terbiyeden yoksun olmak, 4. batırmak, iflâs ettirmek, yoksun bırakmak, mahvetmek. His embezzlement ~ed the company : Zimmetine para geçirmesi şirketi batırdı/iflâs ettirdi. 5. ~ly : müflisane, iflâs ede-rek/etmişçesine.
            
            I would like this EXPECTED text preferably: :)
            
            bankrupt [tab] 1. huk. batkın, müflis, batmış, iflâs etmiş, borçlarını ödeyemeyen kimse, 2. meteliksiz, 3. yoksun, mahrum, düşkün 4. batırmak, iflâs ettirmek, yoksun bırakmak, mahvetmek.
            bankruptly [tab] müflisane, iflâs ede-rek/etmişçesine.
            
            
            if the above one is not possible, then I would like this EXPECTED text: :)
            bankrupt [tab] 1. huk. batkın, müflis, batmış, iflâs etmiş, borçlarını ödeyemeyen kimse, to go ~ : batmak, iflâs etmek, to be ~ : yoksun olmak. He seems to be ~ of all kind feelings : Her türlü asil duygulardan yoksun görünüyor. fraudulent/negli-gent - : kötü niyetli batkın, hileli müflis, 2. meteliksiz, 3. yoksun, mahrum, düşkün, an ~ intellectual: fikir yoksunu. ~ of intelligence : akılsız, a moral ~ : ahlâk düşkünü, to be - of manners : terbiyeden yoksun olmak, 4. batırmak, iflâs ettirmek, yoksun bırakmak, mahvetmek. His embezzlement ~ed the company : Zimmetine para geçirmesi şirketi batırdı/iflâs ettirdi.
            bankruptly [tab] müflisane, iflâs ede-rek/etmişçesine.
            1 Reply Last reply Reply Quote 1
            • guy038
              guy038 last edited by

              Hi, @glossar and All,

              Thanks for following my advice which allows anyone of us to grab your plain text !

              I’ve already figured out how to do, with two regexes, but I still need some pieces of information

              • A) : Are all the present headword’s definitions, placed one after another, without any line-break, between or else ?

              • B) : Do the present headwords always begin the lines or could it be some blank characters, before each headword ?

              • C) : Right after each headword, is the present sequence of characters always :

                • TABULATION char + the string 1. + a SPACE char + definition(s)… ( case of bankrupt header )

                • TABULATION char + definition(s)… ( case of bankruptly header )

              or anything else ? For instance spaces before and/or after the tabulation character ?

              BR

              guy038

              1 Reply Last reply Reply Quote 3
              • glossar
                glossar last edited by

                Hi guy,
                Thank you.
                To answer your questions:

                • A) : No, but you can assume that they are so, since the majority of them are placed so. I can gladly ignore and delete the ones in the end, which don’t follow the pattern in question, or I might visually go through and fix them manually if it would be worth it. But just in case you could do a magic with regex and fix the ones with a line-break between as well, there are some entries like below (again, due to the distortion/loss arisen from the conversion):
                  bankrupt[tab]1. huk. batkın, müflis, batmış, iflâs etmiş, bor
                  çlarını ödeyemeyen kimse, to go ~ : batmak, iflâs etmek, to be ~ : yoksun olmak. He seems to be ~ of all kind feelings : Her türlü asil duygulardan yoksun görünüyor. fraudulent/negli-gent - : kötü niyetli batkın, hileli müflis,
                  2. meteliksiz, 3. yoksun, mahrum, düşkün, an ~ intellectual: fikir
                  yoksunu. ~ of intelligence : akılsız, a moral ~ : ahlâk
                  düşkünü, to be - of manners : terbiyeden yoksun olmak, 4. batırmak, iflâs ettirmek, yoksun bırakmak, mahvetmek. His embezzlement ~ed the company : Zimmetine para geçirmesi şirketi batırdı/iflâs ettirdi. 5. ~ly : müflisane, iflâs ede-rek/etmişçesine.

                • B) : Yes, the headwords always begin the lines.

                • C) :
                  TABULATION + (0 space/char) + 1. + (0 or 1 or more chars/spaces) + definition(s)… ( case of bankrupt header )
                  TABULATION + (0 space/char) + (0 or 1 number followed by a dot) + (0 or 1 or more chars/spaces) + definition(s)… ( case of bankruptly header )

                1 Reply Last reply Reply Quote 0
                • guy038
                  guy038 last edited by guy038

                  Hi, @glossar and All,

                  Thanks for your additional hints ! So, here is my first try ! I will consider the TEST text, below :

                  
                  
                  bankrupt	1. huk. batkın, müflis, batmış, iflâs etmiş, bor
                  çlarını ödeye
                  meyen kimse, to go ~ : batmak, iflâs etmek, to be ~ : yoksun olmak. He seems to be ~ of all kind feelings : Her türlü asil duygulardan yoksun görünüyor. fraudulent/negli-gent - : kötü niyetli batkın, hileli müflis, 2. meteliksiz, 3. yoksun, mahrum, düşkün, an ~ intellectual: fikir yoksunu. ~ of intelligence : akılsız, a moral ~ : ahlâk düşkünü, to be - of manners : terbiyeden yoksun olmak, 4. batırmak, iflâs ettirmek, yoksun bırakmak, mahvetmek. His embezzlement ~ed the company : Zimmetine para geçirmesi şirketi batırdı/iflâs ettirdi. 5. ~ly : müflisane, iflâs ede-rek/etmişçesine. 6. ~able : müflisane, iflâs ede-rek/etmişçesine.
                  
                  
                  
                  bankrupt	1. huk. batkın, müflis, batmış, iflâs etmiş, borçlarını ödeyemeyen kimse, to go ~ : batmak, iflâs etmek, to be ~ : yoksun olmak. He seems to be ~ of all kind fee
                  lings : Her türlü asil duygulardan yoksun görünüyor. fraudulent/negli-gent - : kötü niyetli batkın, hileli müflis, 2. meteliksiz, 3. yoksun, mahrum, düşkün, an ~ intellectual: fikir yoksunu. ~ of intelligence : akılsız, a moral ~ : ahlâk düşkünü, to be - of manners : terbiyeden yoksun olmak, 4. batırmak, iflâs ettirmek, yoksun bırakmak, mahvetmek. His embezzlement ~ed the company : Zimmetine para geçirmesi şirketi batırdı/iflâs ettirdi. 5. ~ly : müflisane, iflâs ede-rek/etmişçesine, to be ~ : yoksun olmak. 6. ~able : müflisane, iflâs ede-rek/etmişçesine, to be ~ : yoksun olmak.
                  

                  In that TEST text, @glossar, you’ll notice several particularities :

                  • I duplicated the bankrupt header word, with some line-breaks, between, to simulate a second header word, below the first one !

                  • In the first bankrupt header word, I decided to split text, that you want to keep, twice. So you get the text :

                  bankrupt	1. huk. batkın, müflis, batmış, iflâs etmiş, bor
                  çlarını ödeye
                  meyen kimse, to go ~ : batmak, iflâs etmek,.......
                  
                  • In the second bankrupt header word, I decided to split text, that you want to get rid of, once. So you get the text :
                  bankrupt	1. huk. batkın, müflis, batmış, iflâs etmiş, borçlarını ödeyemeyen kimse, to go ~ : batmak, iflâs etmek, to be ~ : yoksun olmak. He seems to be ~ of all kind fee
                  lings : Her türlü asil duygulardan yoksun görünüyor........
                  
                  • In the two bankrupt header words, I added, at the end of the definitions, the part :
                   6. ~able : müflisane, iflâs ede-rek/etmişçesine.
                  

                  to simulate a third header word bankruptable ( BTW, from DSpellcheck it’s not a correct English word ! )

                  • Finally, in the second bankrupt header word, I also added, at the end of the 5. and new 6. definitions, the following rubbish text :
                  to be ~ : yoksun olmak.
                  

                  to simulate a part of text which we must to get rid of !


                  Now, let’s go, modifying that text, correctly ;-))

                  • Move back to the very beginning of your words list ( Ctrl + Home )

                  • Open the Replace dialog ( Ctrl + H )

                  • Uncheck the Wrap around option

                  • SEARCH (\R)\R*(?=\w+\t)|\R(?=[^\t\r\n]+\R)

                  • REPLACE ?1\1

                  • Click ONCE on the Replace All button

                  This first regex S/R will perform two things :

                  • It will delete any line-break between header words

                  • It will delete any additional line-break, wrongly added during the conversion phase

                  So, you get the following text :

                  bankrupt	1. huk. batkın, müflis, batmış, iflâs etmiş, borçlarını ödeyemeyen kimse, to go ~ : batmak, iflâs etmek, to be ~ : yoksun olmak. He seems to be ~ of all kind feelings : Her türlü asil duygulardan yoksun görünüyor. fraudulent/negli-gent - : kötü niyetli batkın, hileli müflis, 2. meteliksiz, 3. yoksun, mahrum, düşkün, an ~ intellectual: fikir yoksunu. ~ of intelligence : akılsız, a moral ~ : ahlâk düşkünü, to be - of manners : terbiyeden yoksun olmak, 4. batırmak, iflâs ettirmek, yoksun bırakmak, mahvetmek. His embezzlement ~ed the company : Zimmetine para geçirmesi şirketi batırdı/iflâs ettirdi. 5. ~ly : müflisane, iflâs ede-rek/etmişçesine. 6. ~able : müflisane, iflâs ede-rek/etmişçesine.
                  bankrupt	1. huk. batkın, müflis, batmış, iflâs etmiş, borçlarını ödeyemeyen kimse, to go ~ : batmak, iflâs etmek, to be ~ : yoksun olmak. He seems to be ~ of all kind feelings : Her türlü asil duygulardan yoksun görünüyor. fraudulent/negli-gent - : kötü niyetli batkın, hileli müflis, 2. meteliksiz, 3. yoksun, mahrum, düşkün, an ~ intellectual: fikir yoksunu. ~ of intelligence : akılsız, a moral ~ : ahlâk düşkünü, to be - of manners : terbiyeden yoksun olmak, 4. batırmak, iflâs ettirmek, yoksun bırakmak, mahvetmek. His embezzlement ~ed the company : Zimmetine para geçirmesi şirketi batırdı/iflâs ettirdi. 5. ~ly : müflisane, iflâs ede-rek/etmişçesine, to be ~ : yoksun olmak. 6. ~able : müflisane, iflâs ede-rek/etmişçesine, to be ~ : yoksun olmak.
                  

                  The second regex S/R, below, will create the two new headers bankruptly and bankruptable, after each bankrupt header word :

                  • SEARCH (?s)(\w+)\t[^\t]+\K\x20\d+\.\x20~(\w+)\x20:

                  • REPLACE \r\n\1\2\t1.

                  • Click, SEVERAL times, on the Replace All button exclusively ( Do not use the Replace button ) till the message Replace All: 0 occurrence were replaced occurs ! ( In this example, you’ll need to click, 3 times )

                  You’ll obtain :

                  bankrupt	1. huk. batkın, müflis, batmış, iflâs etmiş, borçlarını ödeyemeyen kimse, to go ~ : batmak, iflâs etmek, to be ~ : yoksun olmak. He seems to be ~ of all kind feelings : Her türlü asil duygulardan yoksun görünüyor. fraudulent/negli-gent - : kötü niyetli batkın, hileli müflis, 2. meteliksiz, 3. yoksun, mahrum, düşkün, an ~ intellectual: fikir yoksunu. ~ of intelligence : akılsız, a moral ~ : ahlâk düşkünü, to be - of manners : terbiyeden yoksun olmak, 4. batırmak, iflâs ettirmek, yoksun bırakmak, mahvetmek. His embezzlement ~ed the company : Zimmetine para geçirmesi şirketi batırdı/iflâs ettirdi.
                  bankruptly	1. müflisane, iflâs ede-rek/etmişçesine.
                  bankruptable	1. müflisane, iflâs ede-rek/etmişçesine.
                  bankrupt	1. huk. batkın, müflis, batmış, iflâs etmiş, borçlarını ödeyemeyen kimse, to go ~ : batmak, iflâs etmek, to be ~ : yoksun olmak. He seems to be ~ of all kind feelings : Her türlü asil duygulardan yoksun görünüyor. fraudulent/negli-gent - : kötü niyetli batkın, hileli müflis, 2. meteliksiz, 3. yoksun, mahrum, düşkün, an ~ intellectual: fikir yoksunu. ~ of intelligence : akılsız, a moral ~ : ahlâk düşkünü, to be - of manners : terbiyeden yoksun olmak, 4. batırmak, iflâs ettirmek, yoksun bırakmak, mahvetmek. His embezzlement ~ed the company : Zimmetine para geçirmesi şirketi batırdı/iflâs ettirdi.
                  bankruptly	1. müflisane, iflâs ede-rek/etmişçesine, to be ~ : yoksun olmak.
                  bankruptable	1. müflisane, iflâs ede-rek/etmişçesine, to be ~ : yoksun olmak.
                  

                  Finally, the third regex S/R, below, should get rid of all text, containing bold/italic sections :

                  • SEARCH (?<=[,.])[\w\x20]+?~.+?(?=\x20\d+|\R|\z)

                  • REPLACE Leave EMPTY

                  • Click, ONCE on the Replace All button, exclusively ( Again, do not use the Replace button )

                  And… here is your expected text :

                  bankrupt	1. huk. batkın, müflis, batmış, iflâs etmiş, borçlarını ödeyemeyen kimse, 2. meteliksiz, 3. yoksun, mahrum, düşkün, 4. batırmak, iflâs ettirmek, yoksun bırakmak, mahvetmek.
                  bankruptly	1. müflisane, iflâs ede-rek/etmişçesine.
                  bankruptable	1. müflisane, iflâs ede-rek/etmişçesine.
                  bankrupt	1. huk. batkın, müflis, batmış, iflâs etmiş, borçlarını ödeyemeyen kimse, 2. meteliksiz, 3. yoksun, mahrum, düşkün, 4. batırmak, iflâs ettirmek, yoksun bırakmak, mahvetmek.
                  bankruptly	1. müflisane, iflâs ede-rek/etmişçesine,
                  bankruptable	1. müflisane, iflâs ede-rek/etmişçesine,
                  

                  Now, give it a try of these 3 regexes , against your real text and verify if some problems still remain ;-))

                  See you later,

                  Cheers,

                  guy038

                  1 Reply Last reply Reply Quote 3
                  • glossar
                    glossar last edited by glossar

                    Hi guy,
                    Thank you so much! For the bankrupt entry, we almost got there! I tried the three regexes several times, I might still have missed something but there seems to be a tiny problem with the second bankrupt entry. The first regex won’t join the “fee” and “lings…” together. I introduced a second line-break in the second bankrupt entry, this time it fixed the first line-break and joined “fee” and “lings…” together but didn’t touch the second one. Hence I got the following results respectively:

                    bankrupt	1. huk. batkın, müflis, batmış, iflâs etmiş, borçlarını ödeyemeyen kimse, 2. meteliksiz, 3. yoksun, mahrum, düşkün, 4. batırmak, iflâs ettirmek, yoksun bırakmak, mahvetmek.
                    bankruptly	1. müflisane, iflâs ede-rek/etmişçesine.
                    bankruptable	1. müflisane, iflâs ede-rek/etmişçesine.
                    bankrupt	1. huk. batkın, müflis, batmış, iflâs etmiş, borçlarını ödeyemeyen kimse,
                    lings : Her türlü asil duygulardan yoksun görünüyor. fraudulent/negli-gent - : kötü niyetli batkın, hileli müflis, 2. meteliksiz, 3. yoksun, mahrum, düşkün, 4. batırmak, iflâs ettirmek, yoksun bırakmak, mahvetmek.
                    bankruptly	1. müflisane, iflâs ede-rek/etmişçesine,
                    bankruptable	1. müflisane, iflâs ede-rek/etmişçesine,
                    
                    bankrupt	1. huk. batkın, müflis, batmış, iflâs etmiş, borçlarını ödeyemeyen kimse, 2. meteliksiz, 3. yoksun, mahrum, düşkün, 4. batırmak, iflâs ettirmek, yoksun bırakmak, mahvetmek.
                    bankruptly	1. müflisane, iflâs ede-rek/etmişçesine.
                    bankruptable	1. müflisane, iflâs ede-rek/etmişçesine.
                    bankrupt	1. huk. batkın, müflis, batmış, iflâs etmiş, borçlarını ödeyemeyen kimse,
                    niyetli batkın, hileli müflis, 2. meteliksiz, 3. yoksun, mahrum, düşkün, 4. batırmak, iflâs ettirmek, yoksun bırakmak, mahvetmek.
                    bankruptly	1. müflisane, iflâs ede-rek/etmişçesine,
                    bankruptable	1. müflisane, iflâs ede-rek/etmişçesine,
                    

                    I also applied the regexes to few other entries. They don’t seem to get the job done. Two things that I could point out:

                    • There may be only 1, or 2 or more suffixes (up to 5) within an entry consecutively, e.g “7. ~ly: (definition(s)), 8 ~able: (definition(s)), 9. ~ness: definition(s)).”

                    • A colon (:) may or may not, without or with one or more spaces before or after it, follow the respective suffix, only the numerals are consistent, i.e. each suffix is preceded by a numeral. Below are some possilibities, not all because you will get the idea:
                      [number+dot]+(0 space)+(~suffix)+(0 space)+(0 colon)+(0 space)+definition(s)
                      [number+dot]+(0 space)+(~suffix)+(0 space)+(1 colon)+(0 space)+definition(s)
                      [number+dot]+(1 space)+(~suffix)+(0 space)+(0 colon)+(1 space)+definition(s)
                      [number+dot]+(1 space)+(~suffix)+(0 space)+(1 colon)+(1 space)+definition(s)
                      [number+dot]+(1 space)+(~suffix)+(1 space)+(0 colon)+(1 space)+definition(s)
                      [number+dot]+(1 space)+(~suffix)+(1 space)+(1 colon)+(1 space)+definition(s)
                      [number+dot]+(0 space)+(~suffix)+(1 space)+(0 colon)+(1 space)+definition(s)
                      [number+dot]+(0 space)+(~suffix)+(1 space)+(1 colon)+(1 space)+definition(s)
                      [number+dot]+(2 or more spaces)+(~suffix)+(0 space)+(0 colon)+(0 space)+definition(s)
                      [number+dot]+(2 or more spaces)+(~suffix)+(1 space)+(0 colon)+(0 space)+definition(s)
                      [number+dot]+(2 or more spaces)+(~suffix)+(1 space)+(1 colon)+(0 space)+definition(s)
                      [number+dot]+(2 or more spaces)+(~suffix)+(0 space)+(0 colon)+(1 space)+definition(s)
                      [number+dot]+(2 or more spaces)+(~suffix)+(1 space)+(0 colon)+(1 space)+definition(s)
                      [number+dot]+(2 or more spaces)+(~suffix)+(0 space)+(1 colon)+(1 space)+definition(s)
                      [number+dot]+(2 or more spaces)+(~suffix)+(1 space)+(1 colon)+(1 space)+definition(s)
                      [number+dot]+(2 or more spaces)+(~suffix)+(0 space)+(0 colon)+(2 or more spaces)+definition(s)
                      [number+dot]+(2 or more spaces)+(~suffix)+(1 space)+(0 colon)+(2 or more spaces)+definition(s)
                      [number+dot]+(2 or more spaces)+(~suffix)+(1 space)+(1 colon)+(2 or more spaces)+definition(s)
                      [number+dot]+(2 or more spaces)+(~suffix)+(1 space)+(0 colon)+(2 or more spaces)+definition(s)
                      …
                      …
                      and all other permutations :(

                    1 Reply Last reply Reply Quote 1
                    • guy038
                      guy038 last edited by guy038

                      @glossar, and All,

                      OK ! So just let’s split the problem in smaller pieces and focus on the first S/R ;-))

                      You said :

                      there seems to be a tiny problem with the second bankrupt entry. The first regex won’t join the “fee” and “lings…” together

                      From my regex, it should !! Of course, I assume that the TAB character ( \t ) only exists after each dictionary header word, only !

                      So, first, could you verify that the \t char occurs right after each entry, only and never occurs elsewhere ?


                      Now, from this TEST_2 text, below, with some line breaks, between header words and additional line-breaks, added inside definitions ( 5 in the definition #1 , 1 in the definition #2 and 2 in the definition #3 ) :

                      bankrupt	1. huk.
                      
                      
                      
                      
                      
                      bankrupt	1. huk. batkın, müflis, batmış, iflâs etmiş, bor
                      çlarını ödeye
                      meyen kimse, to go ~ : batmak, iflâs etmek, to be ~ : yoksun olmak.
                       He seems to be ~ of all kind fee
                      lings : Her türlü asil duygu
                      lardan yoksun görünüyor. fraudulent/negli-gent - : kötü niyetli batkın, hileli müflis, 2. meteli
                      ksiz, 3. yoksun, mahrum, düşkün, an ~ intellectual: fikir 
                      yoksunu. ~ of intelligence : akılsız, a moral
                       ~ : ahlâk düşkünü, to be - of manners : terbiyeden yoksun olmak, 4. batırmak, iflâs ettirmek, yoksun bırakmak, mahvetmek. His embezzlement ~ed the company : Zimmetine para geçirmesi şirketi batırdı/iflâs ettirdi. 5. ~ly : müflisane, iflâs ede-rek/etmişçesine. 6. ~able : müflisane, iflâs ede-rek/etmişçesine.
                      
                      
                      
                      
                      bankrupt	1. huk.
                      

                      With my first regex :

                      • SEARCH (\R)\R*(?=\w+\t)|\R(?=[^\t\r\n]+\R)

                      • REPLACE ?1\1

                      After clicking on the Replace All button, you should get this text :

                      bankrupt	1. huk.
                      bankrupt	1. huk. batkın, müflis, batmış, iflâs etmiş, borçlarını ödeyemeyen kimse, to go ~ : batmak, iflâs etmek, to be ~ : yoksun olmak. He seems to be ~ of all kind feelings : Her türlü asil duygulardan yoksun görünüyor. fraudulent/negli-gent - : kötü niyetli batkın, hileli müflis, 2. meteliksiz, 3. yoksun, mahrum, düşkün, an ~ intellectual: fikir yoksunu. ~ of intelligence : akılsız, a moral ~ : ahlâk düşkünü, to be - of manners : terbiyeden yoksun olmak, 4. batırmak, iflâs ettirmek, yoksun bırakmak, mahvetmek. His embezzlement ~ed the company : Zimmetine para geçirmesi şirketi batırdı/iflâs ettirdi. 5. ~ly : müflisane, iflâs ede-rek/etmişçesine. 6. ~able : müflisane, iflâs ede-rek/etmişçesine.
                      bankrupt	1. huk.
                      

                      Which proves that unnecessary line-breaks have been removed ! Could you confirm me that’s the text obtained, after the regex S/R ?

                      BR

                      guy038

                      1 Reply Last reply Reply Quote 3
                      • glossar
                        glossar last edited by glossar

                        Hi guy,

                        Just a quick confirmation: I’ve re-produced the same results with TEST_2 text and previous ones. I simply introduced a line-break to the very last line after copying&pasting by hitting the enter. This last line-break fixed the problem. I’ll continue to apply the regexes to severeal other entries and I’ill report problems in case I encounter.

                        Thank you so much for your time and effort! I do muchappreciate it!

                        1 Reply Last reply Reply Quote 3
                        • First post
                          Last post
                        Copyright © 2014 NodeBB Forums | Contributors