Community
    • Login

    New user having trouble getting line/blank operations to work

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    33 Posts 5 Posters 6.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn @motreo
      last edited by

      @motreo said in New user having trouble getting line/blank operations to work:

      I don’t know anything about Unicode

      This may be a problem on a bigger scale, given what you seem to be doing. Maybe best to go off and do some learning.

      1 Reply Last reply Reply Quote 1
      • motreoM
        motreo @Alan Kilborn
        last edited by

        @alan-kilborn that’s it - all the spaces where there isn’t any dot are highlighted when doing a search for \xa0

        1 Reply Last reply Reply Quote 0
        • motreoM
          motreo @PeterJones
          last edited by

          @peterjones I recommend you just do a search for \xA0 and replace with \x20, which will replace all NBSP with normal spaces.

          Worked like a charm! And allows me to get rid of those extra spaces using search/replace :)

          1 Reply Last reply Reply Quote 2
          • motreoM
            motreo
            last edited by

            This post is deleted!
            motreoM 1 Reply Last reply Reply Quote 0
            • motreoM
              motreo @motreo
              last edited by

              @guy038 Per the recommendation of @peterjones, I got rid of all the funky non-normal blank spaces by replacing \xa0 with \x20. Now that I’m left with a transcript with CRLF line endings and only normal blank spaces, do you know what expression can be used to join only consecutive lines + lines separated by a single blank line?

              PeterJonesP Terry RT 2 Replies Last reply Reply Quote 0
              • PeterJonesP
                PeterJones @motreo
                last edited by PeterJones

                @motreo ,

                single spaced
                will be joined
                
                as will double-spaced
                
                
                but not triple spaced
                
                • FIND = (?<![\r\n])(\R){1,2}(?!\R)
                • REPLACE = \x20
                • REPLACE ALL

                This says “for matches that don’t have a \r or \n before it, match 1 or 2 newline sequences, which aren’t followed by a newline” and “replace with a space”. This will collapse lines that are single spaced or double spaced into one line, but triple spaced or wider will be left unedited.

                This is just one solution that seems to fit your description. TIMTOWTDI.

                ----

                Useful References

                • Please Read Before Posting
                • Template for Search/Replace Questions
                • FAQ: Where to find regular expressions (regex) documentation
                • Notepad++ Online User Manual: Searching/Regex
                motreoM 1 Reply Last reply Reply Quote 0
                • Terry RT
                  Terry R @motreo
                  last edited by

                  @motreo said in New user having trouble getting line/blank operations to work:

                  do you know what expression can be used to join only consecutive lines + lines separated by a single blank line?

                  I will repeat my request. The previous time you showed “real text” was after edits had been done. So if you could provide the “original” text in that manner you might get a response that can fix ALL them in 1 go. If not then at least once you have the process sorted you can make a macro of the steps. This can then be saved and played back whenever you have a transcript to process.

                  Terry

                  motreoM 1 Reply Last reply Reply Quote 1
                  • motreoM
                    motreo @Terry R
                    last edited by

                    @terry-r Sorry, I think I’m still a bit confused by what you mean when you say “real text”. Do you mean something like this?

                    Imagine the day a civilization discovers the  
                    
                    starry night sky above contains billions of 
                    billions of worlds awaiting their arrival.  
                    
                    Now imagine the day they realize 
                    those voyages will never be made.
                     
                    
                    So earlier this week we were talking about Kessler 
                    Syndrome, collision cascades around planets that  
                    

                    I’m not sure if the original text needs to be the starting point - now that I know how to quickly get rid of no-break spaces and change line endings to CRLF, wouldn’t it make more sense to treat that as my starting point?

                    Terry RT 1 Reply Last reply Reply Quote 1
                    • Terry RT
                      Terry R @motreo
                      last edited by Terry R

                      @motreo said in New user having trouble getting line/blank operations to work:

                      wouldn’t it make more sense to treat that as my starting point?

                      If you want to start at the new starting point that is OK by me. But what I’m saying is that you have this original transcript that includes NBSP (non-breaking spaces) and LF without CR codes.

                      Regular expressions (regex) are a wondrous thing. They magically fix all that, well maybe not magically but they are very powerful if coded well. There’s a real chance 1 regex can do it all! You’d open the “original” transcript in Notepad++, hit a macro and voila, the result appears as you want it.

                      Terry

                      PS thanks for the latest example. That format allows us (the coders) to take a stab at the real data, almost albeit without NBSP and give you a solution.

                      Terry RT 1 Reply Last reply Reply Quote 0
                      • motreoM
                        motreo @PeterJones
                        last edited by

                        @peterjones

                        single spaced
                        will be joined
                        
                        as will double-spaced
                        
                        
                        but not triple spaced
                        

                        What does triple spaced mean in this context? I ran the expression and everything was merged into one long, single paragraph. There are extra spaces left between words (between 2-5 spaces) but those are easy to edit out with search/replace. Let me post a before and after:

                        Before

                        Imagine the day a civilization discovers the  
                        
                        starry night sky above contains billions of 
                        billions of worlds awaiting their arrival.  
                        
                        Now imagine the day they realize 
                        those voyages will never be made.
                         
                        
                        So earlier this week we were talking about Kessler 
                        Syndrome, collision cascades around planets that 
                        

                        After

                        Imagine the day a civilization discovers the   starry night sky above contains billions of  billions of worlds awaiting their arrival.   Now imagine the day they realize  those voyages will never be made.   So earlier this week we were talking about Kessler  Syndrome, collision cascades around planets that   
                        
                        1 Reply Last reply Reply Quote 0
                        • Terry RT
                          Terry R @Terry R
                          last edited by

                          @terry-r said in New user having trouble getting line/blank operations to work:

                          There’s a real chance 1 regex can do it all! You’d open the “original” transcript in Notepad++, hit a macro and voila, the result appears as you want it.

                          @motreo
                          I think I have the 1 regex to do:

                          1. convert NBSP (\xa0) into ordinary spaces (\x20).
                          2. remove multiple spaces together, replacing with 1 ordinary space.
                          3. remove 1 or 2 line feeds with possible spaces\NBSP in between (\n seen as LF).
                          4. make any 3 line feeds together (with possible spaces\NBSP in between) just a single CRLF (Windows carriage return\line feed).

                          So with your original transcript you should be able to open that in Notepad++, copy the next 2 bits of code (highlighted in red, just use copy and paste) into the appropriate field in the Replace function and click on Replace All.

                          Find What:(\xa0|\x20)(?=([\xa0\x20])?)|\n([\xa0\x20]*\n([\xa0\x20]*\n)?)?
                          Replace With:(?{1}(?{2}:\x20))(?{4}\r\n)

                          It will look complicated, don’t worry about that at the moment. I will describe what it is doing, once you confirm it works as expected.

                          Let us know how it went. As you only provided a small sample there is a chance it may miss something to which you were unaware of, this happens.

                          Terry

                          motreoM 1 Reply Last reply Reply Quote 0
                          • motreoM
                            motreo @Terry R
                            last edited by

                            @terry-r Just gave it a try, using a version of the transcript with CRLF line endings and NBSPs intact (I just copied and pasted the text into Notepad++ rather than downloading the transcript file and selecting Open in Notepad++, which yields LF line endings and gives the error message that the document is read only when trying to run expressions).

                            The NBSPs were removed and CRLF line endings changed to just CR, but the lines themselves were unchanged/no lines were merged. I checked Regular expression and Wrap around and left the Transparency box blank. Thanks again for helping me with this, I really appreciate it.

                            Terry RT 1 Reply Last reply Reply Quote 0
                            • Terry RT
                              Terry R @motreo
                              last edited by Terry R

                              @motreo said in New user having trouble getting line/blank operations to work:

                              using a version of the transcript with CRLF line endings and NBSPs intact (I just copied and pasted the text into Notepad++ rather than downloading the transcript file and selecting Open in Notepad++, which yields LF line endings and gives the error message that the document is read only when trying to run expressions).

                              I meant for you to use the “original” version with the LF. If you rather use an already altered version you need to let us know. Regex are created specifically for a task using the data in a pre-determined format. Altering anything in the data will very likely throw out the result, as you saw.

                              What you could try is this altered Find What code:
                              (\xa0|\x20)(?=([\xa0\x20])?)|\R([\xa0\x20]*\R([\xa0\x20]*\R)?)?
                              however I haven’t tested it

                              Terry

                              PS I actually explained what my regex was doing, I thought it would have been obvious it was concerned with the LF version.

                              motreoM 1 Reply Last reply Reply Quote 0
                              • guy038G
                                guy038
                                last edited by guy038

                                Hello, @motreo, @terry-r, @alan-kilborn, @peterjones and All,

                                OK, I understood… Each time a line-break occurs in @motreo’s text, it may be preceded by possible horizontal space characters, like \x20 , \xa0 or \t. Moreover, if some horizontal space chars are found between words, they may be different from a simple space char. So, a solution could be :

                                • Open the Replace dialog ( Ctrl + H )

                                  • SEARCH (\h*\R){3,}|(\h*\R){1,2}|(?![\r\n])([\t\xa0]+|\x20{2,})(?!\R)

                                    • REPLACE (?1\r\n\r\n)(?2\x20)?3\x20    if the line endings must be CRLF, after replacement
                                  • OR

                                    • REPLACE (?1\n\n)(?2\x20)?3\x20    if the line endings must be LF, after replacement

                                    • Untick all box options

                                    • Tick the Wrap around option

                                    • Select the Regular expression search mode

                                    • Click once on the Replace All button or several times, till the end of process, on the Replace button

                                    • Hit the ESC button to close the Replace dialog


                                Note : May be, you’ll have to delete the last space character at the very end of your text. Not a big task, anyway !

                                Best Regards,

                                guy038

                                motreoM 1 Reply Last reply Reply Quote 3
                                • motreoM
                                  motreo @guy038
                                  last edited by

                                  @guy038 It worked!!! I’m so glad - thanks so much for your help with this! :)

                                  1 Reply Last reply Reply Quote 1
                                  • motreoM
                                    motreo @Terry R
                                    last edited by

                                    @terry-r For some reason I can’t use any regexs on the LF versions of these transcripts - I get an error message about the document being read only. Guy038 wrote a new regex that worked like a charm, though.

                                    Thanks again for your help leading me through this. I appreciate it :)

                                    1 Reply Last reply Reply Quote 2
                                    • First post
                                      Last post
                                    The Community of users of the Notepad++ text editor.
                                    Powered by NodeBB | Contributors