New user having trouble getting line/blank operations to work
-
@motreo said in New user having trouble getting line/blank operations to work:
export wasn’t the right word to use. What I meant is that when I go to save the file from its source (downsub.com), I get a transcript with LF endings if I choose Open with Notepad++.
Well maybe export was the right word! :-)
The saving of the file by whatever is saving it is doing so to a Linux file format. No problem for Notepad++, but as a user of the data, you have to know if you want to keep it in Linux format, or change it over to Windows format.
-
@motreo said in New user having trouble getting line/blank operations to work:
Is there a way to get rid of these non-breaking space characters?
Probably should confirm it first. Do a regular expression search for
\xa0
and see if it matches the suspect spaces. -
@motreo said in New user having trouble getting line/blank operations to work:
@terry-r Thanks for explaining things further. Here’s a screenshot of some text showing where dots are placed:
At this point I think you REALLY need to provide examples in the format I requested (read that FAQ post). We need actual text to work on to help you. Images do not show the information, so we are only guessing (informed guesses they might be).
Your issue is certainly fixable, just need the “real text”.
Terry
-
@terry-r said in New user having trouble getting line/blank operations to work:
At this point I think you REALLY need to provide examples in the format I requested (read that FAQ post). We need actual text
I think OP tried to do this, when he said:
I copied and pasted this text but the spaces all seemed to be “normal”. :-(
-
@alan-kilborn said in New user having trouble getting line/blank operations to work:
I copied and pasted this text but the spaces all seemed to be “normal”. :-(
Apparently another quirk of the forum. If I copy/paste from there, or View Source on the webpage,
the starry
just has normal spaces. If I use my moderator powers to “edit” the post (don’t worry, @motreo , I didn’t save my edits), and copy from the original post, it’s actuallythe\xA0\xA0\x20starry
. So yes, there are two NBSP (\xA0
) in between those words.So @motreo, you do have fancy spaces. I recommend you just do a search for
\xA0
and replace with\x20
, which will replace all NBSP with normal spaces.(We regulars will have to try to remember that even the text boxes can edit some characters, including the backslash-[ and NBSP)
-
@motreo said in New user having trouble getting line/blank operations to work:
I don’t know anything about Unicode
This may be a problem on a bigger scale, given what you seem to be doing. Maybe best to go off and do some learning.
-
@alan-kilborn that’s it - all the spaces where there isn’t any dot are highlighted when doing a search for
\xa0
-
@peterjones I recommend you just do a search for
\xA0
and replace with\x20
, which will replace all NBSP with normal spaces.Worked like a charm! And allows me to get rid of those extra spaces using search/replace :)
-
This post is deleted! -
@guy038 Per the recommendation of @peterjones, I got rid of all the funky non-normal blank spaces by replacing
\xa0
with\x20
. Now that I’m left with a transcript withCRLF
line endings and only normal blank spaces, do you know what expression can be used to join only consecutive lines + lines separated by a single blank line? -
@motreo ,
single spaced will be joined as will double-spaced but not triple spaced
- FIND =
(?<![\r\n])(\R){1,2}(?!\R)
- REPLACE =
\x20
- REPLACE ALL
This says “for matches that don’t have a \r or \n before it, match 1 or 2 newline sequences, which aren’t followed by a newline” and “replace with a space”. This will collapse lines that are single spaced or double spaced into one line, but triple spaced or wider will be left unedited.
This is just one solution that seems to fit your description. TIMTOWTDI.
----
Useful References
- FIND =
-
@motreo said in New user having trouble getting line/blank operations to work:
do you know what expression can be used to join only consecutive lines + lines separated by a single blank line?
I will repeat my request. The previous time you showed “real text” was after edits had been done. So if you could provide the “original” text in that manner you might get a response that can fix ALL them in 1 go. If not then at least once you have the process sorted you can make a macro of the steps. This can then be saved and played back whenever you have a transcript to process.
Terry
-
@terry-r Sorry, I think I’m still a bit confused by what you mean when you say “real text”. Do you mean something like this?
Imagine the day a civilization discovers the starry night sky above contains billions of billions of worlds awaiting their arrival. Now imagine the day they realize those voyages will never be made. So earlier this week we were talking about Kessler Syndrome, collision cascades around planets that
I’m not sure if the original text needs to be the starting point - now that I know how to quickly get rid of no-break spaces and change line endings to
CRLF
, wouldn’t it make more sense to treat that as my starting point? -
@motreo said in New user having trouble getting line/blank operations to work:
wouldn’t it make more sense to treat that as my starting point?
If you want to start at the new starting point that is OK by me. But what I’m saying is that you have this original transcript that includes NBSP (non-breaking spaces) and LF without CR codes.
Regular expressions (regex) are a wondrous thing. They magically fix all that, well maybe not magically but they are very powerful if coded well. There’s a real chance 1 regex can do it all! You’d open the “original” transcript in Notepad++, hit a macro and voila, the result appears as you want it.
Terry
PS thanks for the latest example. That format allows us (the coders) to take a stab at the real data, almost albeit without NBSP and give you a solution.
-
single spaced will be joined as will double-spaced but not triple spaced
What does triple spaced mean in this context? I ran the expression and everything was merged into one long, single paragraph. There are extra spaces left between words (between 2-5 spaces) but those are easy to edit out with search/replace. Let me post a before and after:
Before
Imagine the day a civilization discovers the starry night sky above contains billions of billions of worlds awaiting their arrival. Now imagine the day they realize those voyages will never be made. So earlier this week we were talking about Kessler Syndrome, collision cascades around planets that
After
Imagine the day a civilization discovers the starry night sky above contains billions of billions of worlds awaiting their arrival. Now imagine the day they realize those voyages will never be made. So earlier this week we were talking about Kessler Syndrome, collision cascades around planets that
-
@terry-r said in New user having trouble getting line/blank operations to work:
There’s a real chance 1 regex can do it all! You’d open the “original” transcript in Notepad++, hit a macro and voila, the result appears as you want it.
@motreo
I think I have the 1 regex to do:- convert NBSP (\xa0) into ordinary spaces (\x20).
- remove multiple spaces together, replacing with 1 ordinary space.
- remove 1 or 2 line feeds with possible spaces\NBSP in between (\n seen as LF).
- make any 3 line feeds together (with possible spaces\NBSP in between) just a single CRLF (Windows carriage return\line feed).
So with your original transcript you should be able to open that in Notepad++, copy the next 2 bits of code (highlighted in red, just use copy and paste) into the appropriate field in the Replace function and click on Replace All.
Find What:
(\xa0|\x20)(?=([\xa0\x20])?)|\n([\xa0\x20]*\n([\xa0\x20]*\n)?)?
Replace With:(?{1}(?{2}:\x20))(?{4}\r\n)
It will look complicated, don’t worry about that at the moment. I will describe what it is doing, once you confirm it works as expected.
Let us know how it went. As you only provided a small sample there is a chance it may miss something to which you were unaware of, this happens.
Terry
-
@terry-r Just gave it a try, using a version of the transcript with
CRLF
line endings and NBSPs intact (I just copied and pasted the text into Notepad++ rather than downloading the transcript file and selecting Open in Notepad++, which yieldsLF
line endings and gives the error message that the document is read only when trying to run expressions).The NBSPs were removed and
CRLF
line endings changed to justCR
, but the lines themselves were unchanged/no lines were merged. I checked Regular expression and Wrap around and left the Transparency box blank. Thanks again for helping me with this, I really appreciate it. -
@motreo said in New user having trouble getting line/blank operations to work:
using a version of the transcript with CRLF line endings and NBSPs intact (I just copied and pasted the text into Notepad++ rather than downloading the transcript file and selecting Open in Notepad++, which yields LF line endings and gives the error message that the document is read only when trying to run expressions).
I meant for you to use the “original” version with the LF. If you rather use an already altered version you need to let us know. Regex are created specifically for a task using the data in a pre-determined format. Altering anything in the data will very likely throw out the result, as you saw.
What you could try is this altered Find What code:
(\xa0|\x20)(?=([\xa0\x20])?)|\R([\xa0\x20]*\R([\xa0\x20]*\R)?)?
however I haven’t tested itTerry
PS I actually explained what my regex was doing, I thought it would have been obvious it was concerned with the LF version.
-
Hello, @motreo, @terry-r, @alan-kilborn, @peterjones and All,
OK, I understood… Each time a line-break occurs in @motreo’s text, it may be preceded by possible horizontal space characters, like
\x20
,\xa0
or\t
. Moreover, if some horizontal space chars are found between words, they may be different from a simplespace
char. So, a solution could be :-
Open the Replace dialog (
Ctrl + H
)-
SEARCH
(\h*\R){3,}|(\h*\R){1,2}|(?![\r\n])([\t\xa0]+|\x20{2,})(?!\R)
- REPLACE
(?1\r\n\r\n)(?2\x20)?3\x20
if the line endings must beCRLF
, after replacement
- REPLACE
-
OR
-
REPLACE
(?1\n\n)(?2\x20)?3\x20
if the line endings must beLF
, after replacement -
Untick all box options
-
Tick the
Wrap around
option -
Select the
Regular expression
search mode -
Click once on the
Replace All
button or several times, till the end of process, on theReplace
button -
Hit the
ESC
button to close the Replace dialog
-
-
Note : May be, you’ll have to delete the last
space
character at the very end of your text. Not a big task, anyway !Best Regards,
guy038
-
-
@guy038 It worked!!! I’m so glad - thanks so much for your help with this! :)