How to start?
-
Thank you for your advice, it worked perfectly.
-
New problem:
I would like to replace single CRLFs with a space but keep double CRLFCRLFs for future formatting.
Example of what it looks like:
Svartnande väggar¶
¶
Jag har ett 50-talshus med väggar av Ytong/lättbetong. Invändigt¶
finns skivor (OSB, masonite, spån och gips) och över dessa¶
skivor är det antingen målat eller också finns tapet. Sedan jag¶
flyttade in i villan på 90-talet har jag successivt renoverat de¶
olika rummen, en del rum så nyligen som för fyra år sedan.¶
Efter bara ett par år börjar väggarna svartna, i vissa hela hela¶
ytor, i andra fall följer svärtan de bakomliggande reglarna¶
eller också följer svärtan bakomliggande spikrader eller¶
fograder av cement.¶How I want it:
Svartnande väggar¶
¶
Jag har ett 50-talshus med väggar av Ytong/lättbetong. Invändigt finns skivor (OSB, masonite, spån och gips) och över dessa skivor är det antingen målat eller också finns tapet. Sedan jag flyttade in i villan på 90-talet har jag successivt renoverat de olika rummen, en del rum så nyligen som för fyra år sedan. Efter bara ett par år börjar väggarna svartna, i vissa hela hela ytor, i andra fall följer svärtan de bakomliggande reglarna eller också följer svärtan bakomliggande spikrader eller fograder av cement.¶ -
@Mon-Onkel said:
I would like to replace single CRLFs with a space but keep double CRLFCRLFs for future formatting.
For that, you move into the realm of regular expressions (regex), which can be used in the notepad++ search/replace window.
To define the regex, I think of your request as “remove any newline sequence that isn’t immediately followed by another newline sequence”. Translating to regex syntax, set your Find What to
\R(?!\R)
and your replace to a single space (cannot see it in the window; if that worries you, use the escape sequence \x20, which is a space). Make sure “regular expression” is selected.Unfortunately, sometimes your first guess is wrong (like mine was), and you have to do an iterative process: the problem here is that the newline between the blank line and the paragraph gets deleted with that expression.
At that point, you’ve got a couple of choices: you can either get into more advanced regular expressions (and my negative lookahead was already pretty advanced), that would be an if/then set (“if there’s only one newline, replace it with a space; if there’s multiple newlines, keep them intact”). But it’s often easier (and faster to debug) just doing a few regular expressions in a row; in this case, I’d do "look for multiple newlines in a row, and replace them with a dummy character (picking some character not already in my document); then replace all remaining (single) newlines with a space; then replace the dummy character with two newlines.
- find =
\R\R+
(two or more newline sequences), replace =☺
(smiley, or other character of your choice) - find =
\R
(since there should only be single newlines left, don’t need the negative lookahead), replace =\x20
(space) - find =
☺
, replace =\r\n\r\n
(two windows newlines *)- *: it’s confusing;
\R
works as matching either\r\n
or\n
on the search side, but isn’t valid on the replace side, so I had to manually do the CRLF using\r\n
; I still use a single\R
in my searches, because it’s easier to type
- *: it’s confusing;
That will change
Svartnande väggar Jag har ett 50-talshus med väggar av Ytong/lättbetong. Invändigt finns skivor (OSB, masonite, spån och gips) och över dessa skivor är det antingen målat eller också finns tapet. Sedan jag flyttade in i villan på 90-talet har jag successivt renoverat de olika rummen, en del rum så nyligen som för fyra år sedan. Efter bara ett par år börjar väggarna svartna, i vissa hela hela ytor, i andra fall följer svärtan de bakomliggande reglarna eller också följer svärtan bakomliggande spikrader eller fograder av cement.
into
Svartnande väggar Jag har ett 50-talshus med väggar av Ytong/lättbetong. Invändigt finns skivor (OSB, masonite, spån och gips) och över dessa skivor är det antingen målat eller också finns tapet. Sedan jag flyttade in i villan på 90-talet har jag successivt renoverat de olika rummen, en del rum så nyligen som för fyra år sedan. Efter bara ett par år börjar väggarna svartna, i vissa hela hela ytor, i andra fall följer svärtan de bakomliggande reglarna eller också följer svärtan bakomliggande spikrader eller fograder av cement.
-----
FYI: I often add this to my response in regex threads, unless I am sure the original poster has seen it before. Here is some helpful information for finding out more about regular expressions, and for formatting posts in this forum (especially quoting data) so that we can fully understand what you’re trying to ask:This forum is formatted using Markdown, with a help link buried on the little grey
?
in the COMPOSE window/pane when writing your post. For more about how to use Markdown in this forum, please see @Scott-Sumner’s post in the “how to markdown code on this forum” topic, and my updates near the end. It is very important that you use these formatting tips – using single backtick marks around small snippets, and using code-quoting for pasting multiple lines from your example data files – because otherwise, the forum will change normal quotes (""
) to curly “smart” quotes (“”
), will change hyphens to dashes, will sometimes hide asterisks (or if your text isc:\folder\*.txt
, it will show up asc:\folder*.txt
, missing the backslash). If you want to clearly communicate your text data to us, you need to properly format it.If you have further search-and-replace (“matching”, “marking”, “bookmarking”, regular expression, “regex”) needs, study this FAQ and the documentation it points to. Before asking a new regex question, understand that for future requests, many of us will expect you to show what data you have (exactly), what data you want (exactly), what regex you already tried (to show that you’re showing effort), why you thought that regex would work (to prove it wasn’t just something randomly typed), and what data you’re getting with an explanation of why that result is wrong. When you show that effort, you’ll see us bend over backward to get things working for you. If you need help formatting, see the paragraph above.
Please note that for all regex and related queries, it is best if you are explicit about what needs to match, and what shouldn’t match, and have multiple examples of both in your example dataset. Often, what shouldn’t match helps define the regular expression as much or more than what should match.
- find =
-
Hello, @mon-onkel, @Peterjones and All,
Peter, a 1-step solution, which could replace your 3-steps regex is :
SEARCH
(\R)\K\R+|\R
REPLACE
?1\1:\x20
Using, exclusively, the
Replace All
buttonNotes :
-
The additional consecutive line-breaks, after a first one, are replaced with the first line-break, only. So with
\r\n
,\n
, or\r
, depending of the type of file ! -
In case of a single line-break ( second alternative of the regex ), it’ll be replaced with a single space char, due to the conditional replacement structure
Remark :
You have to be careful when using the
\R
syntax. Inded, as\R
may match, either\n
,\r
or\r\n
and as the regex engine tries, by all means, to find an overall match, with the back-tracking mechanism, some false positive matches may occur :-((
So, rather than consider which character should not occur after a line break
\R
, we could just define which character must occur, before and after a line-break. And it comes this simple regex :SEARCH
(?-s).\K\R(?=.)
REPLACE
\x20
which means : replace any line-break, both preceded and followed by a standard character, with a space character
Best Regards,
guy038
-
-
Hello @guy038 and All,
I’m sorry that I don’t follow the formatting etiquette in this forum, but as I wrote in my first post I’m not a programmer or even interested in programming, and I hope that I won’t need your help on any more projects. Please bear with me.
@guy038, thank you for your advice, it worked perfectly.
My last (I hope) problem is that I want to remove every mail address in the document.
What I want is
From: Arne Lodin
or
From: Arne Lodin Stockholm, SwedenWhat I don’t want is
From: Arne Lodin altitune@comhem.se
or
From: Arne Lodin [altitune@comhem.se] Stockholm, SwedenSo, the mail address is sometimes surrounded by a space and a line-break, and sometimes with two spaces.
-
In the first What I don’t want is, the mail address is surrounded by Vs lying down, smaller-than and larger-than.
-
@Mon-Onkel said:
formatting etiquette
For me, it’s not about “etiquette”. It’s about clarity. When you don’t use the formatting tools of the forum (here, or anyplace else), it makes communication harder – it makes it harder for us to understand what you want, and thus makes it harder for us to help you. Since your goal is presumably “getting help”, I would think you would use any tool at your disposal to make it easier to get help.
For example, since you didn’t bother looking at the page on formatting hints, and thus didn’t include your newest set of data in Markdown to make sure it was interpreted correctly, I cannot know whether your data is:
From: Arne Lodin altitune@comhem.se
or
From: Arne Lodin <altitune@comhem.se>
because both render the same in the forum:
—
From: Arne Lodin altitune@comhem.se
From: Arne Lodin altitune@comhem.se
—update: oh, I see that you at least caught that, and posted an update while I was typing this up.
I’m not a programmer or even interested in programming
You don’t need to be. But you should be willing to learn how to use the tool, rather than asking people to just do something for you. I understand on the first one, you’re probably just looking for the quick fix. But you’re on to your third search/replace, so I would hope for a willingness to learn.
Ah, well. Here’s one that might do what you want. But there are a lot of edge cases it doesn’t test for, so I normally wouldn’t recommend this without getting a better idea of what your data is really like. maybe you’ll at least take the advice of “save and backup your data before running this on critical data”, because I cannot guarantee it will do what you want with no unintended side effects.
- FIND =
\S+\@\S+\h*
- REPLACE = empty
- MODE = regular expression
Good luck.
- FIND =
-
Dear @PeterJones ,
I’m very grateful for your help, and your last advice was just on spot.
I’m 78 years old and have no intention to try to learn new a discipline that I hopefully will not need again. So, again, please bear with me.
-
@Mon-Onkel said:
I’m 78 years old and have no intention to try to learn new a discipline that I hopefully will not need again
So you are just going to keep posting all of your needs here and waiting for answers? Sorry, this doesn’t really fly. People like to help, but they want to see people LEARN from that help, and apply the techniques to their similar problems. People that demonstrate that they aren’t learning and just keep asking are politely told that with all the help provided already, they should be able to extrapolate and now solve their own problems.
78 is not an excuse. My 91-year-old great-aunt just got her first iPhone, and is teaching my 16-year-old son stuff about it that he didn’t know…
-
Hi, @alan-kilborn,
Just congratulate your great aunt to be such a modern person !! To be honest, I’m pretty sure I’ll be uncomfortable with most of new objects of that time, that is…, in
23
years ;-))BR
guy038