Community
    • Login

    How do I merge two or more consecutive lines into one?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    26 Posts 7 Posters 24.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by guy038

      Hi, glossar and Vasile,

      Hum… Very strange ! As for me, it works quite fine ! So, Vasile, let’s recapitulate :

      • glossar, from your original text, below :

        Line 344: bördelversuch TAB flanging test
        Line 345: bördelversuch TAB folding test
        …
        Line 28872: führungszapfen TAB guide pilot
        Line 28873: führungszapfen TAB guide pin
        Line 28874: führungszapfen TAB pilot pin
        …
        Line 659368: horizontal geteilt TAB horizontally divided
        Line 659369: horizontal geteilt TAB horizontally split

      • And, taking in account that :

        • The string TAB, with a space character before and after, refers to a single tabulation character \t , of code \x09

        • The part line number #####: , that begins each line, for information, must be deleted

      We, finally, get the text, to work on :

      bördelversuch	flanging test
      bördelversuch	folding test
      
      führungszapfen	guide pilot
      führungszapfen	guide pin
      führungszapfen	pilot pin
      
      horizontal geteilt	horizontally divided
      horizontal geteilt	horizontally split
      

      Now,

      • Move back the caret before the first line

      • Open the Replace dialog ( Ctrl + H )

      • In the Find what zone, type the regex (?-s)^((.+\t).+)\R\2(.+)

      • In the Replace with zone, type the regex \1, \3 , with a SPACE character, after the comma symbol

      • Select the Regular expression search mode

      • Uncheck all the other options

      • Click THREE times, on the Replace All button, till the message Replace All: 0 occurrences were replaced is displayed, in blue, at the bottom of the Replace dialog

      You should get,as expected, the replaced text, below :

      bördelversuch	flanging test, folding test
      
      führungszapfen	guide pilot, guide pin, pilot pin
      
      horizontal geteilt	horizontally divided, horizontally split
      

      Cheers,

      guy038

      Scott SumnerS ? 2 Replies Last reply Reply Quote 1
      • Vasile CarausV
        Vasile Caraus
        last edited by Vasile Caraus

        hello Guy, nope, is not working. Please take a look on this print screen:

        https://snag.gy/6ts5Gy.jpg

        or here

        https://snag.gy/O0ceRV.jpg

        1 Reply Last reply Reply Quote 1
        • glossarG
          glossar
          last edited by

          Guy - You’re the man! :) Thanks a million! It works! I can’t believe it, but it works!

          Thank you!

          By the way, I like the way you write, your writing style, and enjoy reading your posts! :)

          1 Reply Last reply Reply Quote 0
          • glossarG
            glossar
            last edited by

            No, Vasile, it works!

            Here is my screenshot! :)

            https://snag.gy/V6acUA.jpg

            By the way, this “Snaggy” website is cool! Posting a screenshot couldn’t indeed be easier! I have bookmarked it! Thank you! :)

            And finally, I find it a silly practice that new users have to wait for 1200 seconds in order for them to submit their second post!

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hello, Vasile,

              Ah, yes ! Quite weird ! From your print screen pictures, everything seems OK : We, both, have the same fields and options and, from the status bar, our encoding and line breaks are also identical !!??

              Moreover, I verified that the Glossar’s text, inserted in your new tab, does have the same displaying than my text ! This implies that you correctly inserted the tabulation characters ( of 4 spaces characters, by default, like me ) !!??

              So, the only thing that could explain the search failure should be, that you, probably, inserted some invisible character(s) in the search regex ?

              But, I must admit that I’m really annoyed to not being able to point out the true reason of your N++'s search behaviour !

              See you later…

              Cheers,

              guy038

              P.S. :

              Glossar, just see your reply to Vasile. Quite pleased that it works as expected, on your configuration !

              BTW, some of us may could test my regex ? May be, it will help us to identify the problem :-)

              One more point, Vasile, which N++ version are you using ?

              1 Reply Last reply Reply Quote 0
              • Vasile CarausV
                Vasile Caraus
                last edited by

                hello. I am using v7.1, no update available.

                But, I had an error yesterday morning, something when N++ had to update Plugin Manager and a Npp plugin, don’t remember very well. I will restart the computer tomorow, and I will test again.

                1 Reply Last reply Reply Quote 0
                • Vasile CarausV
                  Vasile Caraus
                  last edited by

                  no, after restart nothing change. And I installed the v.722, ant still doesn’t work. What a bug is this?

                  1 Reply Last reply Reply Quote 0
                  • Scott SumnerS
                    Scott Sumner @guy038
                    last edited by

                    @guy038 said:

                    Click THREE times, on the Replace All button, till the message Replace All: 0 occurrences were replaced is displayed, in blue, at the bottom of the Replace dialog

                    Out of curiosity I tried it, and found that I had to press Replace All only TWO times to get a complete replacement on these three groupings of text. The first press of Replace All does the first and third sets of data; the second press does the second set. However, in the end, it changed the data as expected.

                    Vasile CarausV 1 Reply Last reply Reply Quote 0
                    • guy038G
                      guy038
                      last edited by guy038

                      Hi, Scott, glossar and vasile,

                      Yes, Scott, two times are enough, indeed !. However, I replied to glossar for the general case where you do not know, exactly, what is the maximum of lines with “identical beginnings”, because of a huge file, for instance !

                      In that case, it’s better to click on the Replace All button till the message 0 occurrences were replaced appears ! Indeed, as long as yon obtain the message Nocccurrences were replaced, with N > 0, you cannot guess that no more occurrence has to be replaced, the next time :-))

                      Cheers,

                      guy038

                      1 Reply Last reply Reply Quote 0
                      • Vasile CarausV
                        Vasile Caraus @Scott Sumner
                        last edited by

                        @Scott-Sumner

                        I press 50 times “Replace all”, and nothing happen. There is a bug somewhere…

                        1 Reply Last reply Reply Quote 0
                        • Vasile CarausV
                          Vasile Caraus @Frank Orellana
                          last edited by

                          @Frank-Orellana

                          this is working fine !

                          1 Reply Last reply Reply Quote 0
                          • Vasile CarausV
                            Vasile Caraus
                            last edited by

                            hello guy38, and all friends. Maybe you can improve my regex to resolve the solution (other way).

                            So, the regex below will bind (merge) all the sentences into one. The problem is that is not cut the words that repeats.

                            Search:
                            \s+(.*?)

                            Replace with:
                            leave space

                            1 Reply Last reply Reply Quote 0
                            • Vasile CarausV
                              Vasile Caraus
                              last edited by Vasile Caraus

                              there will be another good solution, that works:

                              Find What: ^(\w+\s+\w+\s*)(.*)\n\1
                              Replace With: \1\2,

                              1 Reply Last reply Reply Quote 0
                              • ?
                                A Former User @guy038
                                last edited by

                                @guy038 How can I combine a specific line with a character, ei. Line 2 - 7 should be in line 2 only. and same with the others

                                e1b28aa3-eac0-459b-b0aa-ca391816e22e-image.png

                                glossarG 1 Reply Last reply Reply Quote 0
                                • glossarG
                                  glossar @A Former User
                                  last edited by

                                  @guy038

                                  Hello guy,
                                  I’ve had to re-visit this topic as I encounter a problem with the regex you had provided above, for which I’ve been grateful to you! It’s weird that the said problem accurs when I join two files (with same line structures, i.e. text-tabulation-text order, and with similar contents/texts) that I’m dealing, whereas the regex has so far worked like a charm. What is odd, is that when I apply the regex on the these files individually/separately it still works, only after I join (two of) them it fails. By joining, I mean adding the whole content of a file to the another, sorting it in Excel, pasting back to txt file. The said problem is that it consumes/deletes all texts/contexts/lines except one line (leaving only a comma behind) usually on the second “Replace all” click, so no “0 occurrances were replaced” message is possible.

                                  What might causes it?

                                  I use NP++ v. 7.9.3 (64-bit)

                                  Many thanks in advance!
                                  glossar

                                  1 Reply Last reply Reply Quote 0
                                  • guy038G
                                    guy038
                                    last edited by guy038

                                    Hi, @glossar,

                                    As usual, could you provide the text ( or part of it ) that you get back from Excel, and saved as text in N++ and for which the regex S/R, below, wrongly removes almost everything ?

                                    SEARCH (?-s)^((.+\t).+)\R\2(.+)

                                    REPLACE \1,\x20\3

                                    BR

                                    guy038

                                    glossarG 1 Reply Last reply Reply Quote 0
                                    • glossarG
                                      glossar @guy038
                                      last edited by glossar

                                      @guy038 said in How do I merge two or more consecutive lines into one?:

                                      \1,\x20\3

                                      Hello guy,

                                      This is what it lefts behind:

                                      triplex plunger pump	{TECH&ANGEWANDTE} <convey> Dreiplungerpumpe f.; 3-Plunger-Pumpe f.; Triplex-Plungerpumpe f.,
                                      

                                      There is a tabulation right after the word “pump”, sorry, I can’t make it visible and there is no CRLF at the en of the line.

                                      It may be useful or necessary for trouble-shooting to put this line in “context” (as we often use it), so here it is with few lines before and after:

                                      triplex bundle conductor	{ELEKTROTECH} Bündelleiter m. aus drei Teilleitern, Dreierbündel n.
                                      triplex chain	{TECH&ANGEWANDTE} <driv> Triplexkette f.
                                      triplex milling machine	{TECH&ANGEWANDTE} <mach/tool> Dreispindelfräsmaschine f.
                                      triplex operation	{ELEKTROTECH} NRT Triplexbetrieb m.
                                      triplex plunger pump	{TECH&ANGEWANDTE} <convey> Dreiplungerpumpe f.; 3-Plunger-Pumpe f.; Triplex-Plungerpumpe f.
                                      triplex pump	{TECH&ANGEWANDTE} <convey> Dreizylinderpumpe f.; Dreikolbenpumpe f.; 3-Zylinder-Pumpe f advt; Triplexpumpe f.
                                      triplex ram pump	{TECH&ANGEWANDTE} <convey> Dreiplungerpumpe f.; 3-Plunger-Pumpe f.; Triplex-Plungerpumpe f.
                                      triple-X syndrome	{MEDIZIN} XXX-Syndrom n., X-Trisomie f. (Chromosomenanomalie)
                                      triplex system	{TELEKOMM} Triplexsystem n.
                                      triplex winding	{TECH&ANGEWANDTE} <el> Dreifachwicklung f.; Dreilagenwicklung f.; Dreischleifenwicklung f.
                                      triplex-coated particle	{TECH} n. NUC TECH dreifach beschichtetes Teilchen nt.
                                      

                                      Again, there is a tabulation just before the char “{” on each line, with CRLF at the end.

                                      And a bit food for trouble-shooting, I’m trying make a “big mama” dictionary file from a series of (smaller) ones, with of course no duplicate headwords, hence applying the regex above. So far I have joined (in the above sense) several of them but there are more to join. Each (new) one of the file I simply add to the already added ones, in a way, I simply glue one after another and on top of another to get the “big mama” after I successfully join all of the files, of course again with no duplicate headwords (=everything before the tabulation is here headword). Suppose, I have three files from which I’d get a “big mama”. I take the first one, sort it in Excel, pasdt back it to txt file, appy the regex, repeat the same process for the second one. Now I simply copy the all content of the second file and paste it to the first one, then I sort it this added content/text in Excel, apply the regex. Now I have a text with no duplicate headwords, but consists the file #1 and #2. The last thing I’d do is to copy the content of the third file, sort it in Excel, past it back to text file, apply the regex to ensure it contains no duplicate headwords before joining it with the already joined file (File1 + File2), coply and past it to this “joined” file, repeat the same process, finally I have the big mama!

                                      As I said above, the regex works on each separate file, it seems it works as long as the text stays on the same file, but not after transferring one text from one file to another, which at first made me think that it might be with the different encoding issue. But I have checked the encodings of previously successfully joined/added files and verified that encoding doesn’t cause the problem (at least not when one file is UTF-8 and the other is UTF-8-BOM encoded.)

                                      Now the only thing I could suspect is some char (a non-alphanumeric char or a char that falls beyond coverage of the regex). Since there are thousands of lines in each file, I can’t manually or visually go through to spot them if there is any.

                                      Sorry for keeping it long this much. Just wanted to give you as much data as possible so that you could trouble-shoot.

                                      Thanks,
                                      glossar

                                      Alan KilbornA 1 Reply Last reply Reply Quote 0
                                      • guy038G
                                        guy038
                                        last edited by guy038

                                        Hello, @glossar,

                                        I do not understand. The text that you provided :

                                        triplex bundle conductor	{ELEKTROTECH} Bündelleiter m. aus drei Teilleitern, Dreierbündel n.
                                        triplex chain	{TECH&ANGEWANDTE} <driv> Triplexkette f.
                                        triplex milling machine	{TECH&ANGEWANDTE} <mach/tool> Dreispindelfräsmaschine f.
                                        triplex operation	{ELEKTROTECH} NRT Triplexbetrieb m.
                                        triplex plunger pump	{TECH&ANGEWANDTE} <convey> Dreiplungerpumpe f.; 3-Plunger-Pumpe f.; Triplex-Plungerpumpe f.
                                        triplex pump	{TECH&ANGEWANDTE} <convey> Dreizylinderpumpe f.; Dreikolbenpumpe f.; 3-Zylinder-Pumpe f advt; Triplexpumpe f.
                                        triplex ram pump	{TECH&ANGEWANDTE} <convey> Dreiplungerpumpe f.; 3-Plunger-Pumpe f.; Triplex-Plungerpumpe f.
                                        triple-X syndrome	{MEDIZIN} XXX-Syndrom n., X-Trisomie f. (Chromosomenanomalie)
                                        triplex system	{TELEKOMM} Triplexsystem n.
                                        triplex winding	{TECH&ANGEWANDTE} <el> Dreifachwicklung f.; Dreilagenwicklung f.; Dreischleifenwicklung f.
                                        triplex-coated particle	{TECH} n. NUC TECH dreifach beschichtetes Teilchen nt.
                                        

                                        seems useless ! Indeed, if I applied the regex S/R, against your example, it does not find any match !

                                        I need the initial text, which is practically almost removed AFTER processing the regex S/R

                                        If you prefer, you may send it to me, by e-mail. Here is my temporary e-mail address :

                                        BR

                                        guy038

                                        glossarG 1 Reply Last reply Reply Quote 1
                                        • Alan KilbornA
                                          Alan Kilborn @glossar
                                          last edited by

                                          Here is my temporary e-mail address

                                          @glossar

                                          Do please take advantage of that kindly offered suggestion!

                                          1 Reply Last reply Reply Quote 1
                                          • glossarG
                                            glossar @guy038
                                            last edited by guy038

                                            @guy038 said in [How do I merge two or more consecutive lines into one?]

                                            If you prefer, you may send it to me, by e-mail. Here is my temporary e-mail address :

                                            BR

                                            guy038

                                            Just sent it to the above address.

                                            Thank you!

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors