<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Open a pipe-delimited text file on Windows - where one field may contain CRLF]]></title><description><![CDATA[<p dir="auto">I’m using Notepad++ to open a pipe-delimited text file on Windows - one of the fields may contain CRLF.  Notepad++ thinks every CRLF is a end-of-line character now which looks odd because it caused the line to bump down and be continued on the next line whenever it encounters a CRLF.  Is it possible to adjust Notepad++ so that it will recognize the actual end-of-line characters, but leave the data fields alone that contain CRLF characters?  TIA!</p>
]]></description><link>https://community.notepad-plus-plus.org/topic/23244/open-a-pipe-delimited-text-file-on-windows-where-one-field-may-contain-crlf</link><generator>RSS for Node</generator><lastBuildDate>Fri, 17 Apr 2026 21:32:24 GMT</lastBuildDate><atom:link href="https://community.notepad-plus-plus.org/topic/23244.rss" rel="self" type="application/rss+xml"/><pubDate>Fri, 15 Jul 2022 15:15:35 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Open a pipe-delimited text file on Windows - where one field may contain CRLF on Mon, 25 Jul 2022 16:16:15 GMT]]></title><description><![CDATA[<p dir="auto">Maybe not exactly what you’re looking for but just fyi. I’ve updated the <a href="https://github.com/BdR76/CSVLint/" rel="nofollow ugc">CSV Lint plug-in</a> to v4.5.3 which includes a new option in the “Reformat Data” dialog to replace any carriage return/line feed characters (CRLF or CR or LF) with a given string, for example <code>&lt;br&gt;</code>.</p>
<p dir="auto">You can install the latest plug-in manually by going to <a href="https://github.com/BdR76/CSVLint/releases" rel="nofollow ugc">the releases page</a>, download the zip and follow <a href="https://github.com/BdR76/CSVLint#how-to-install" rel="nofollow ugc">the steps as described here</a>.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/78673</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/78673</guid><dc:creator><![CDATA[Bas de Reuver]]></dc:creator><pubDate>Mon, 25 Jul 2022 16:16:15 GMT</pubDate></item><item><title><![CDATA[Reply to Open a pipe-delimited text file on Windows - where one field may contain CRLF on Fri, 15 Jul 2022 19:02:46 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/3841">@PeterJones</a> Thanks again Peter - I did not know this about the \ limitation and $ for referring to capture groups.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/78407</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/78407</guid><dc:creator><![CDATA[Hardy Merrill]]></dc:creator><pubDate>Fri, 15 Jul 2022 19:02:46 GMT</pubDate></item><item><title><![CDATA[Reply to Open a pipe-delimited text file on Windows - where one field may contain CRLF on Fri, 15 Jul 2022 18:49:45 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/25813">@Hardy-Merrill</a> said in <a href="/post/78403">Open a pipe-delimited text file on Windows - where one field may contain CRLF</a>:</p>
<blockquote>
<p dir="auto">Replace with: <code>\1</code></p>
</blockquote>
<p dir="auto">BTW: I highly recommend getting out of the habit of using the <code>\#</code> notation for Replace With expressions.  Technically, Notepad++ and the Boost Regex library that it uses for regular expressions don’t specify the <code>\#</code> notation, because that backslash-digit notation is for “backreferences”: see <a href="https://npp-user-manual.org/docs/searching/#capture-groups-and-backreferences" rel="nofollow ugc">npp usermanual</a> and the official <a href="https://www.boost.org/doc/libs/1_78_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.back_references" rel="nofollow ugc">boost search syntax</a>, and notice that notation is not listed in the <a href="https://www.boost.org/doc/libs/1_78_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html" rel="nofollow ugc">boost replacement syntax</a>.</p>
<p dir="auto">Moreover, if you have ten or more matching groups, the backslash-digit will not work as you expect.</p>
<p dir="auto">DATA:</p>
<pre><code class="language-txt">ABCDEFHIJxABCDEFHIJyABCDEFHIJzABCDEFHIJx
</code></pre>
<p dir="auto">FIND = <code>(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)</code><br />
REPLACE = <code>\10</code><br />
result:</p>
<pre><code class="language-txt">A0A0A0A0
</code></pre>
<p dir="auto">… this is because the digit backref only work for digits 1-9.  If you try for a 10th capture group, that notation <strong>will not work</strong>.  What <code>\10</code> literally says is “in the replacement, do a backreference to capture group 1, and then insert a literal <code>0</code> character”</p>
<p dir="auto">compare that to<br />
REPLACE = <code>$10</code></p>
<pre><code class="language-txt">xyzx
</code></pre>
<p dir="auto">Which does what you expect.  And<br />
REPLACE = <code>${10}0</code></p>
<pre><code class="language-txt">x0y0z0x0
</code></pre>
<p dir="auto">… which lets you specify a literal digit to go in the replacement after the match group number…</p>
<p dir="auto">If you get into the habit of <code>\1</code>, you will hit the end of your rope with nine capture groups.  If you always use <code>$1</code>, then when you expand to 10 or more groups, it will still work.  And if you get used to always <code>${1}</code>, then when you expand to 10 or more, it won’t ever matter what the next literal character is in the replacement: it will always work as expected.  This is why I try to make my example regex substitutions always use <code>${1}</code> (and when I get lazy, they will occasionally just be <code>$1</code>), but I <em>never</em> use <code>\1</code> in replacements, because is <em>not</em> the canonical way to refer to a capture group contents in the replacement.  Saying “But \1 works for me, so I will just keep on using it” will eventually lead to “why doesn’t \10 work the way I want it to?” after you’ve forgotten because of thousands of repetitions putting <code>\1</code> into muscle memory.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/78406</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/78406</guid><dc:creator><![CDATA[PeterJones]]></dc:creator><pubDate>Fri, 15 Jul 2022 18:49:45 GMT</pubDate></item><item><title><![CDATA[Reply to Open a pipe-delimited text file on Windows - where one field may contain CRLF on Fri, 15 Jul 2022 18:31:36 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/25813">@Hardy-Merrill</a> said in <a href="/post/78403">Open a pipe-delimited text file on Windows - where one field may contain CRLF</a>:</p>
<blockquote>
<p dir="auto">But notice that eee has a CRLF actually as part of the text between the 2nd and 3rd e, so what I see in Notepad++ is this:</p>
<pre><code class="language-txt">{aaa}|bbb|cccCRLF
{ddd}|eeCRLF
e|fffCRLF
{ggg}|hhh|iiiCRLF
</code></pre>
</blockquote>
<p dir="auto"><em><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/3841">@PeterJones</a> mumbles something about poorly designed use of CSV format, feeling sorry for any who have to use it</em></p>
<blockquote>
<p dir="auto">To fix that, the Find and Replace that I used was this:</p>
<p dir="auto">Find what: \r\n([^{])<br />
Replace with: \1</p>
</blockquote>
<p dir="auto">Good job.  Glad you came up with that.  It helped that the first field in a record always starts with <code>{</code>.</p>
<blockquote>
<p dir="auto">As I’m thinking about this - another question - will this replace ALL the instances found in a line (what if a text value contains more than one CRLF), or only the first one?  Thanks again.</p>
</blockquote>
<pre><code class="language-txt">{aaa}|bbb|ccc
{ddd}|ee
e|fff
{ggg}|h
h
h|iii
</code></pre>
<p dir="auto">so the “eeCRLFe” has one newline, and “hCRLFhCRLFh” has two newlines</p>
<p dir="auto">A single REPLACE ALL, given your regex, will end up with</p>
<pre><code class="language-txt">{aaa}|bbb|ccc
{ddd}|eee|fff
{ggg}|hhh|iii
</code></pre>
<p dir="auto">I believe that answers your “another question”.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/78405</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/78405</guid><dc:creator><![CDATA[PeterJones]]></dc:creator><pubDate>Fri, 15 Jul 2022 18:31:36 GMT</pubDate></item><item><title><![CDATA[Reply to Open a pipe-delimited text file on Windows - where one field may contain CRLF on Fri, 15 Jul 2022 17:55:25 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/3841">@PeterJones</a> Thanks Peter!  This got me thinking in the right direction.  Here’s an example - the file I’m opening in Notepad++ looks like this I believe (displaying special characters like CRLF):</p>
<p dir="auto">{aaa}|bbb|cccCRLF<br />
{ddd}|eeCRLFe|fffCRLF<br />
{ggg}|hhh|iiiCRLF</p>
<p dir="auto">But notice that eee has a CRLF actually as part of the text between the 2nd and 3rd e, so what I see in Notepad++ is this:</p>
<p dir="auto">{aaa}|bbb|cccCRLF<br />
{ddd}|eeCRLF<br />
e|fffCRLF<br />
{ggg}|hhh|iiiCRLF</p>
<p dir="auto">To fix that, the Find and Replace that I used was this:</p>
<p dir="auto">Find what: \r\n([^{])<br />
Replace with: \1<br />
Match case: unchecked<br />
Wrap around: unchecked<br />
Search mode<br />
Normal: unselected<br />
Extended: unselected<br />
Regular expression: SELECTED<br />
. matches newline: unselected</p>
<p dir="auto">The find matches lines that end with CRLF where the next character (appears on the next line in Notepad++ although I believe it is on the same line in the file) is NOT left brace.  Replace that with the captured next character on the next line that is NOT left brace - effectively removing the CRLF that came in as part of the text.</p>
<p dir="auto">Hope this makes sense.  Thanks for your help!</p>
<p dir="auto">As I’m thinking about this - another question - will this replace ALL the instances found in a line (what if a text value contains more than one CRLF), or only the first one?  Thanks again.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/78403</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/78403</guid><dc:creator><![CDATA[Hardy Merrill]]></dc:creator><pubDate>Fri, 15 Jul 2022 17:55:25 GMT</pubDate></item><item><title><![CDATA[Reply to Open a pipe-delimited text file on Windows - where one field may contain CRLF on Fri, 15 Jul 2022 16:45:23 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/17261">@Bas-de-Reuver</a> ,</p>
<p dir="auto">I was just thinking: could you add an option to CSV Lint that would hook up two callbacks:</p>
<ul>
<li>On load, replace newline that’s inside a field (ie, inside matched quotes) with the pilcrow (so that it displays nice)</li>
<li>on save, replace pilcrow with a real newline (so that it has the real data whenever it’s saved to disk)</li>
</ul>
<p dir="auto">Because CSV files can have valid newlines inside a string field, as shown with the first of my two text boxes… and having your CSV Lint help with displaying that might be a nice feature (it’s not technically linting… but since the linter is there to make nice syntax highlighting for better being able to see your columns in CSV, displaying a newline that’s in a column as a pilcrow might be an optional “display my CSV better for me” feature).</p>
<p dir="auto">If I find some time later today, I might hack up a PythonScript example of what I am talking about, in case you still don’t understand, or want a sequence that might help.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/78402</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/78402</guid><dc:creator><![CDATA[PeterJones]]></dc:creator><pubDate>Fri, 15 Jul 2022 16:45:23 GMT</pubDate></item><item><title><![CDATA[Reply to Open a pipe-delimited text file on Windows - where one field may contain CRLF on Fri, 15 Jul 2022 16:31:27 GMT]]></title><description><![CDATA[<p dir="auto">I’m not exactly clear on what OP is asking. Notepad++ is a textfile editor and so it will always display the textfile as-is, including all the CRLF’s.</p>
<p dir="auto">Like <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/3841">@PeterJones</a> pointed out, you can install the <a href="https://github.com/BdR76/CSVLint" rel="nofollow ugc">CSV Lint plug-in</a> which will add different colors to the columns, making it easier to see what is what (<a href="https://www.youtube.com/watch?v=k6w5BcaSqHc" rel="nofollow ugc">see video</a>) But, it will still display the file as just a text file.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/78400</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/78400</guid><dc:creator><![CDATA[Bas de Reuver]]></dc:creator><pubDate>Fri, 15 Jul 2022 16:31:27 GMT</pubDate></item><item><title><![CDATA[Reply to Open a pipe-delimited text file on Windows - where one field may contain CRLF on Fri, 15 Jul 2022 15:55:09 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/25813">@Hardy-Merrill</a> ,</p>
<p dir="auto">Notepad++ is a text editor.  The <em>text</em> of your character-separated-values (CSV) file contains newlines, and Notepad++ is properly rendering them as newlines.  Now from a spreadsheet/database perspective, the two newlines may have two different meanings (the ones in the fields are data/string newlines, which you seem to want to treat as not being newlines, whereas the ones at the ends of records are being used as a record separator, which you seem to want to treat as real newlines).</p>
<p dir="auto">However, you can work around it while you are editing.</p>
<p dir="auto">I am going to assume you have the following valid CSV, which has quotes around any field containing newlines:</p>
<pre><code class="language-txt">record1|"This is a long string without newlines"|1234
record2|"This has a newline.
See."|5678
</code></pre>
<p dir="auto">the newline after <code>1234</code> is a record separator newline.  The newline after <code>newline.</code> is a string newline.</p>
<p dir="auto">So to “hide” the newlines in the data, you could do a search like the following:</p>
<ul>
<li>FIND = <code>(\|"[^\r\n"]*)\r\n</code> – this says look for a pipe, then a quote, then multiple characters that are not CR or LF or quote, store those in group1, then a newline sequence</li>
<li>REPLACE = <code>${1}¶</code> – this says replace that whole sequence with the contents of group1, followed by a pilcrow symbol.</li>
<li>SEARCH MODE = Regular expression</li>
<li>REPLACE ALL</li>
</ul>
<p dir="auto">That will give you</p>
<pre><code class="language-txt">record1|"This is a long string without newlines"|1234
record2|"This has a newline.¶See."|5678
</code></pre>
<p dir="auto">so now you can see the individual records, and know which fields contain newline sequences, without it breaking it up to look funny for you.  (If you have a field with multiple newlines, you can just run the search-and-replace a few more times until it stops matching.  There are complicated ways to get it in one fell swoop, but why debug and maintain a complicated regex when you can just hit REPLACE ALL on a simple regex a few times and be done.)</p>
<p dir="auto">When you are done, and you want to go back to a valid pipe-delimited CSV, you can convert <code>¶</code> to <code>\r\n</code> with another regex search-and-replace.</p>
<p dir="auto"><em>Et voila</em>, it’s usable for you, without requiring a huge coding change and having to wait for the developer to implement the niche feature for you (which wasn’t likely to happen even with an official feature request) and without waiting for someone to develop a plugin that does it for you – though it wouldn’t be unreasonable to see if <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/17261">@Bas-de-Reuver</a> wanted to add such a feature to the <a href="https://community.notepad-plus-plus.org/topic/22061/new-plugin-csv-lint/">CSV Lint</a> plugin – because I can see that your feature (automatically change embedded newlines to pilcrow on file read, and automatically change pilcrow to embedded newlines on file save) would be useful to the pretty much the same set of users that CSV Lint is useful for.</p>
<p dir="auto">(NB: if you have an invalid CSV, where field newlines don’t have quotes around it, this won’t work.  If you have more complicated field data, you might have to research regex and come up with a more complicated regex on your own.)</p>
<p dir="auto">----</p>
<h3>Useful References</h3>
<ul>
<li><a href="https://community.notepad-plus-plus.org/topic/21965/please-read-before-posting">Please Read Before Posting</a></li>
<li><a href="https://community.notepad-plus-plus.org/topic/22022/template-for-search-replace-questions">Template for Search/Replace Questions</a></li>
<li><a href="https://community.notepad-plus-plus.org/topic/15765/faq-desk-where-to-find-regular-expressions-regex-documentation">FAQ: Where to find regular expressions (regex) documentation</a></li>
<li><a href="https://npp-user-manual.org/docs/searching/#regular-expressions" rel="nofollow ugc">Notepad++ Online User Manual: Searching/Regex</a></li>
</ul>
]]></description><link>https://community.notepad-plus-plus.org/post/78399</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/78399</guid><dc:creator><![CDATA[PeterJones]]></dc:creator><pubDate>Fri, 15 Jul 2022 15:55:09 GMT</pubDate></item></channel></rss>