How to start?

Mon Onkel

Hello everyone.

I’m not a programmer or even interested in programming, but I was advised by a programmer friend to use notepad++ to clean up some quite large text documents. Windows Notepad have not the options I need, and when I try to edit them in Word my PC faints and must be restarted, and the remaining Word document is of course blank.

But how do I start? Is there a Dummies guide to notepad++, or can I hope from help from the community if I post my amateur questions here? If yes, here is my first question in Word terminology, so you see the level of my problem:

How do I find and delete all paragraph marks that acts like manual line breaks?

(And how do I sign in as a community member?)

Arne Lodin
Stockholm, Sweden

Alan Kilborn

@Mon-Onkel said:

(And how do I sign in as a community member?)

Well…I think…that you’ve already done this! :)

Is there a Dummies guide to notepad++

Not really, no…

advised by a programmer friend to use notepad++

TBH, this person may be the best resource to get over any initial hurdles

I’m not a programmer but I was advised use notepad++ to clean up some documents

Not sure why you’d think you’d have to be a programmer to do this…and do it well and efficiently…

can I hope from help from the community if I post my amateur questions here?

As long as your questions are relevant, a definitive YES! We’ll let you know if you get off track with your questions (we call that baking cookies here).

How do I find and delete all paragraph marks that acts like manual line breaks?

It’s a pretty good first question, but unfortunately I (at least) am not sure what a “paragraph mark that acts like manual line break” means. :-(

I could make assumptions (e.g., you have a bunch of non-empty lines followed by an empty line followed by another bunch of non-empty lines, and you want to remove that pesky empty line) but assumptions probably just mean that I do a lot of mental guesswork that is in all likelihood WRONG. :-(

If it helps to explain your problem, by all means post some sample text here that shows what you want to remove, and then explain as best you can what you want to achieve with that text. Ideally showing some “before” and desired “after” text.

If you want the best possible representation of your sample text here, you can (probably) do the following:

Select your text in Notepad++ (presumes you know how to do that – hint: click and drag with mouse or use shift+arrows on keyboard)
Press TAB key once (this indents your text, hopefully correctly)
Press ctrl+c (copy)
Press ctrl+z (this restores your text to original form)
Switch to web browser with Community forum posting you are composing active
Press ctrl+v (paste)

If all goes well, you will see some black background text in the Preview window on the right as you are composing a posting; it will look like this, but again, on the RIGHT:

hello
there
I'm
some
text

Good luck!

Mon Onkel

Alan, thank you for your answer.

Asking my programmer friend for help is a good idea, but he’s quite busy right now planning his wedding. And there is a saying that God helps those who help themselves, so I hope for help from the gurus here in the community.

Over to your questions, and I now believe that for my first problem I must first do something with the .txt-files before I start editing them.

“Paragraph mark that acts like manual line break” for me looks like this ¶ in Word. There it is easy to search for them and delete them with the usual search-and-replace function. The text files I am editing is filled with line breaks that I want to remove, but the line breaks are invisible for me and the search function in standard Notepad.

Here is an example of what the text looks like in .txt:

Jag har ett 50-talshus med väggar av Ytong/lättbetong. Invändigt
finns skivor (OSB, masonite, spån och gips) och över dessa
skivor är det antingen målat eller också finns tapet. Sedan jag
flyttade in i villan på 90-talet har jag successivt renoverat de
olika rummen, en del rum så nyligen som för fyra år sedan.
Efter bara ett par år börjar väggarna svartna, i vissa hela hela
ytor, i andra fall följer svärtan de bakomliggande reglarna
eller också följer svärtan bakomliggande spikrader eller
fograder av cement.

If I open the document in Word I get this editable result:

Jag har ett 50-talshus med väggar av Ytong/lättbetong. Invändigt¶
finns skivor (OSB, masonite, spån och gips) och över dessa¶
skivor är det antingen målat eller också finns tapet. Sedan jag¶
flyttade in i villan på 90-talet har jag successivt renoverat de¶
olika rummen, en del rum så nyligen som för fyra år sedan.¶
Efter bara ett par år börjar väggarna svartna, i vissa hela hela¶
ytor, i andra fall följer svärtan de bakomliggande reglarna¶
eller också följer svärtan bakomliggande spikrader eller¶
fograder av cement.¶

After search-and-replace in Word I get the result I want:

Jag har ett 50-talshus med väggar av Ytong/lättbetong. Invändigt finns skivor (OSB, masonite, spån och gips) och över dessa skivor är det antingen målat eller också finns tapet. Sedan jag flyttade in i villan på 90-talet har jag successivt renoverat de olika rummen, en del rum så nyligen som för fyra år sedan. Efter bara ett par år börjar väggarna svartna, i vissa hela hela ytor, i andra fall följer svärtan de bakomliggande reglarna eller också följer svärtan bakomliggande spikrader eller fograder av cement.

So, how to proceed without using Word?

Ekopalypse

@Mon-Onkel

if you carefully check the toolbar from npp you see the same sign. If you press it, then you will see, depending on the current setup, either a CR or LF or CRLF. That are the eols.
To remove those you can either use the find dialog and check regular expression.
find what: \r\n
replace with is empty
press replace all
This works if you see CRLF on each line end, if you only see CR then use only \r and if you only see LF then it would be \n

One other alternative in this particular case is to use the key combination
ctrl+a
ctrl+j

this will join all selected lines.

Mon Onkel

@Ekopalypse

Thank you for your advice, it worked perfectly.

Mon Onkel

@Ekopalypse

New problem:

I would like to replace single CRLFs with a space but keep double CRLFCRLFs for future formatting.

Example of what it looks like:

Svartnande väggar¶
¶
Jag har ett 50-talshus med väggar av Ytong/lättbetong. Invändigt¶
finns skivor (OSB, masonite, spån och gips) och över dessa¶
skivor är det antingen målat eller också finns tapet. Sedan jag¶
flyttade in i villan på 90-talet har jag successivt renoverat de¶
olika rummen, en del rum så nyligen som för fyra år sedan.¶
Efter bara ett par år börjar väggarna svartna, i vissa hela hela¶
ytor, i andra fall följer svärtan de bakomliggande reglarna¶
eller också följer svärtan bakomliggande spikrader eller¶
fograder av cement.¶

How I want it:

Svartnande väggar¶
¶
Jag har ett 50-talshus med väggar av Ytong/lättbetong. Invändigt finns skivor (OSB, masonite, spån och gips) och över dessa skivor är det antingen målat eller också finns tapet. Sedan jag flyttade in i villan på 90-talet har jag successivt renoverat de olika rummen, en del rum så nyligen som för fyra år sedan. Efter bara ett par år börjar väggarna svartna, i vissa hela hela ytor, i andra fall följer svärtan de bakomliggande reglarna eller också följer svärtan bakomliggande spikrader eller fograder av cement.¶

PeterJones

@Mon-Onkel said:

I would like to replace single CRLFs with a space but keep double CRLFCRLFs for future formatting.

For that, you move into the realm of regular expressions (regex), which can be used in the notepad++ search/replace window.

To define the regex, I think of your request as “remove any newline sequence that isn’t immediately followed by another newline sequence”. Translating to regex syntax, set your Find What to \R(?!\R) and your replace to a single space (cannot see it in the window; if that worries you, use the escape sequence \x20, which is a space). Make sure “regular expression” is selected.

Unfortunately, sometimes your first guess is wrong (like mine was), and you have to do an iterative process: the problem here is that the newline between the blank line and the paragraph gets deleted with that expression.

At that point, you’ve got a couple of choices: you can either get into more advanced regular expressions (and my negative lookahead was already pretty advanced), that would be an if/then set (“if there’s only one newline, replace it with a space; if there’s multiple newlines, keep them intact”). But it’s often easier (and faster to debug) just doing a few regular expressions in a row; in this case, I’d do "look for multiple newlines in a row, and replace them with a dummy character (picking some character not already in my document); then replace all remaining (single) newlines with a space; then replace the dummy character with two newlines.

find = \R\R+ (two or more newline sequences), replace = ☺ (smiley, or other character of your choice)
find = \R (since there should only be single newlines left, don’t need the negative lookahead), replace = \x20 (space)
find = ☺, replace = \r\n\r\n (two windows newlines *)
- *: it’s confusing; \R works as matching either \r\n or \n on the search side, but isn’t valid on the replace side, so I had to manually do the CRLF using \r\n; I still use a single \R in my searches, because it’s easier to type

That will change

Svartnande väggar

Jag har ett 50-talshus med väggar av Ytong/lättbetong. Invändigt
finns skivor (OSB, masonite, spån och gips) och över dessa
skivor är det antingen målat eller också finns tapet. Sedan jag
flyttade in i villan på 90-talet har jag successivt renoverat de
olika rummen, en del rum så nyligen som för fyra år sedan.
Efter bara ett par år börjar väggarna svartna, i vissa hela hela
ytor, i andra fall följer svärtan de bakomliggande reglarna
eller också följer svärtan bakomliggande spikrader eller
fograder av cement.

into

Svartnande väggar

Jag har ett 50-talshus med väggar av Ytong/lättbetong. Invändigt finns skivor (OSB, masonite, spån och gips) och över dessa skivor är det antingen målat eller också finns tapet. Sedan jag flyttade in i villan på 90-talet har jag successivt renoverat de olika rummen, en del rum så nyligen som för fyra år sedan. Efter bara ett par år börjar väggarna svartna, i vissa hela hela ytor, i andra fall följer svärtan de bakomliggande reglarna eller också följer svärtan bakomliggande spikrader eller fograder av cement.

-----
FYI: I often add this to my response in regex threads, unless I am sure the original poster has seen it before. Here is some helpful information for finding out more about regular expressions, and for formatting posts in this forum (especially quoting data) so that we can fully understand what you’re trying to ask:

This forum is formatted using Markdown, with a help link buried on the little grey ? in the COMPOSE window/pane when writing your post. For more about how to use Markdown in this forum, please see @Scott-Sumner’s post in the “how to markdown code on this forum” topic, and my updates near the end. It is very important that you use these formatting tips – using single backtick marks around small snippets, and using code-quoting for pasting multiple lines from your example data files – because otherwise, the forum will change normal quotes ("") to curly “smart” quotes (“”), will change hyphens to dashes, will sometimes hide asterisks (or if your text is c:\folder\*.txt, it will show up as c:\folder*.txt, missing the backslash). If you want to clearly communicate your text data to us, you need to properly format it.

If you have further search-and-replace (“matching”, “marking”, “bookmarking”, regular expression, “regex”) needs, study this FAQ and the documentation it points to. Before asking a new regex question, understand that for future requests, many of us will expect you to show what data you have (exactly), what data you want (exactly), what regex you already tried (to show that you’re showing effort), why you thought that regex would work (to prove it wasn’t just something randomly typed), and what data you’re getting with an explanation of why that result is wrong. When you show that effort, you’ll see us bend over backward to get things working for you. If you need help formatting, see the paragraph above.

Please note that for all regex and related queries, it is best if you are explicit about what needs to match, and what shouldn’t match, and have multiple examples of both in your example dataset. Often, what shouldn’t match helps define the regular expression as much or more than what should match.

guy038

Hello, @mon-onkel, @Peterjones and All,

Peter, a 1-step solution, which could replace your 3-steps regex is :

SEARCH (\R)\K\R+|\R

REPLACE ?1\1:\x20

Using, exclusively, the Replace All button

Notes :

The additional consecutive line-breaks, after a first one, are replaced with the first line-break, only. So with \r\n, \n, or \r, depending of the type of file !
In case of a single line-break ( second alternative of the regex ), it’ll be replaced with a single space char, due to the conditional replacement structure

Remark :

You have to be careful when using the \R syntax. Inded, as \R may match, either \n, \r or \r\n and as the regex engine tries, by all means, to find an overall match, with the back-tracking mechanism, some false positive matches may occur :-((

So, rather than consider which character should not occur after a line break \R, we could just define which character must occur, before and after a line-break. And it comes this simple regex :

SEARCH (?-s).\K\R(?=.)

REPLACE \x20

which means : replace any line-break, both preceded and followed by a standard character, with a space character

Best Regards,

guy038

Mon Onkel

Hello @guy038 and All,

I’m sorry that I don’t follow the formatting etiquette in this forum, but as I wrote in my first post I’m not a programmer or even interested in programming, and I hope that I won’t need your help on any more projects. Please bear with me.

@guy038, thank you for your advice, it worked perfectly.

My last (I hope) problem is that I want to remove every mail address in the document.

What I want is
From: Arne Lodin
or
From: Arne Lodin Stockholm, Sweden

What I don’t want is
From: Arne Lodin altitune@comhem.se
or
From: Arne Lodin [altitune@comhem.se] Stockholm, Sweden

So, the mail address is sometimes surrounded by a space and a line-break, and sometimes with two spaces.

Mon Onkel

@Mon-Onkel

In the first What I don’t want is, the mail address is surrounded by Vs lying down, smaller-than and larger-than.

PeterJones

@Mon-Onkel said:

formatting etiquette

For me, it’s not about “etiquette”. It’s about clarity. When you don’t use the formatting tools of the forum (here, or anyplace else), it makes communication harder – it makes it harder for us to understand what you want, and thus makes it harder for us to help you. Since your goal is presumably “getting help”, I would think you would use any tool at your disposal to make it easier to get help.

For example, since you didn’t bother looking at the page on formatting hints, and thus didn’t include your newest set of data in Markdown to make sure it was interpreted correctly, I cannot know whether your data is:

From: Arne Lodin altitune@comhem.se

or

From: Arne Lodin <altitune@comhem.se>

because both render the same in the forum:
-–
From: Arne Lodin altitune@comhem.se
From: Arne Lodin altitune@comhem.se
-–

update: oh, I see that you at least caught that, and posted an update while I was typing this up.

I’m not a programmer or even interested in programming

You don’t need to be. But you should be willing to learn how to use the tool, rather than asking people to just do something for you. I understand on the first one, you’re probably just looking for the quick fix. But you’re on to your third search/replace, so I would hope for a willingness to learn.

Ah, well. Here’s one that might do what you want. But there are a lot of edge cases it doesn’t test for, so I normally wouldn’t recommend this without getting a better idea of what your data is really like. maybe you’ll at least take the advice of “save and backup your data before running this on critical data”, because I cannot guarantee it will do what you want with no unintended side effects.

FIND = \S+\@\S+\h*
REPLACE = empty
MODE = regular expression

Good luck.

Mon Onkel

Dear @PeterJones ,

I’m very grateful for your help, and your last advice was just on spot.

I’m 78 years old and have no intention to try to learn new a discipline that I hopefully will not need again. So, again, please bear with me.

Alan Kilborn

@Mon-Onkel said:

I’m 78 years old and have no intention to try to learn new a discipline that I hopefully will not need again

So you are just going to keep posting all of your needs here and waiting for answers? Sorry, this doesn’t really fly. People like to help, but they want to see people LEARN from that help, and apply the techniques to their similar problems. People that demonstrate that they aren’t learning and just keep asking are politely told that with all the help provided already, they should be able to extrapolate and now solve their own problems.

78 is not an excuse. My 91-year-old great-aunt just got her first iPhone, and is teaching my 16-year-old son stuff about it that he didn’t know…

guy038

Hi, @alan-kilborn,

Just congratulate your great aunt to be such a modern person !! To be honest, I’m pretty sure I’ll be uncomfortable with most of new objects of that time, that is…, in 23 years ;-))

BR

guy038