Find & Replace LF not working across whole document
-
I have a large pipe delimited file (>300,000 lines) that is extracted from our ordering system. Because part of what is extracted includes free text inputted by our end users, there are a lot of cases where the rows are broken up, which I can see is caused by a Line Feed, rather than CRLF which is at the genuine end of a line.
I’ve been using (?<!\r)\n to find the sole LFs, and replace them with a space, but it seems like NP++ is only searching a certain number of lines before giving up. If I scroll down further, and search again, it will find a new batch, then give up again.
Is there any way to get the search to go through the whole document? I’ve tried selecting everything, but that isn’t working either.
-
Hello, @by-eck and All,
You do not need to use any regular expression ! Simply :
-
Choose the menu option
Edit > EOL Conversion
-
Click on the
Windows(CR LF)
choice -
If the
Windows(CR LF)
choice is greyed, choose any other choice, first. Then, select theEdit > EOL Conversion > Windows (CR LF)
choice -
Save your modified file
Best Regards,
guy038
-
-
I read it as OP wants to change errant LF line endings into something other than a CRLF line ending, so I think your proposed solution does “too much”.
Unfortunately, I don’t have any ideas for OP to help with why it might be “only searching a certain number of lines before giving up”.
-
@guy038 I think @alan-kilborn is right here, the LF shouldn’t be there at all. I’ve attached a redacted screenshot.
Line 75510 continues to the end, and has CR LF, which is correct, but it looks like when people have been filling in the description for line 75511, they’ve hit return, which has caused the errant LFs to appear.
(?<!\r)\n finds them, but only if I’m within about 250 lines of it, so “Replace All” won’t fix anything.
-
@by-eck ,
I have vague recollections that there was some older version which had EOL-Search-and-replace issues a long time ago. Could you please share your ?-menu’s Debug Info by clicking on the “Copy debug info into clipboard” in that dialog and pasting it here?
I ask about version, because with modern v8.4, I have no problem having the cursor at line 1 of a >75k line file and having it find and replace the LF into something else (whether it’s CRLF or
::
or whatever). I did up a little screen-grab video to show it working: I replicated your line 75510 those 75510 times, and then made a 75511 and following that matched what you showed. Whether I start from the end of the file or the beginning, it can count the matches correctly, and starting from the beginning of the file, I can do single Find Next / Replace, or I can do Replace All, and all four instances get replaced. (My video shows FIND =(?<!\r)\n
, REPLACE =::
, SEARCH MODE = regular expression.)Notepad++ v8.4 (64-bit) Build time : Apr 20 2022 - 03:31:06 Path : C:\usr\local\apps\notepad++\notepad++.exe Command Line : Admin mode : OFF Local Conf mode : OFF Cloud Config : OFF OS Name : Windows 10 Enterprise (64-bit) OS Version : 2009 OS Build : 19042.1586 Current ANSI codepage : 1252 Plugins : AutoSave.dll ComparePlugin.dll DSpellCheck.dll EnhanceAnyLexer.dll ExtSettings.dll MarkdownViewerPlusPlus.dll mimeTools.dll NppConsole.dll NppConverter.dll NppEditorConfig.dll NppExec.dll NppExport.dll NppFTP.dll NppLspClient.dll NppUISpy.dll PreviewHTML.dll PythonScript.dll QuickText.dll TagLEET.dll _CustomizeToolbar.dll
So my guess is that you’ve got an old version of Notepad++, and if you updated to a modern version, you wouldn’t have that problem.
But I also think that you don’t understand Character-Separated-Values (CSV) files and the intention of your data (in your case, the
|
character is the “Character” from “CSV”). If CRLF is the EOL character for a CSV, a single data field (between the|
) is allowed to have something like a bare LF. In fact, that’s how MS Excel encodes a newline embedded in your cell, so that when it saves it as a comma-based CSV, it can have multiline data without making an invalid CSV. A secondary way of embedding newlines in valid CSV is to put quotes around the contents of the cell… and everything in the balanced quotes is part of that cell. So a 3x3 comma-based CSV file might be:one,two,three first,"second with embedded newline",third x,y,z
… in that data, cell(2,2) – which Excel would call B2 – contains a multiline piece of data.
If I were you, I would talk with whoever generated your CSV, to make sure that you properly understand the intent of the data, rather than blindly assuming you need to “fix” the data. And, if it does end up that you do need to “fix” the data, then Notepad++ can handle that task.
-
Here’s the debug, I’m on v8.4 so I don’t think the version is the issue.
Notepad++ v8.4 (64-bit)
Build time : Apr 20 2022 - 03:31:06
Path : C:\Program Files\Notepad++\notepad++.exe
Command Line :
Admin mode : OFF
Local Conf mode : OFF
Cloud Config : OFF
OS Name : Windows 10 Enterprise (64-bit)
OS Version : 1909
OS Build : 18363.2037
Current ANSI codepage : 1252
Plugins : mimeTools.dll NppConverter.dll NppExport.dll -
@by-eck ,
Okay, so it’s not the version.
As my video shows, it works correctly for me in v8.4, even when I start from the beginning of the file. The only other thing I can think of is that you’re running out of memory or something.
Other than that, I guess I’ll bow out, and see if someone else has other ideas why it would work fine for me and not for you.
-
@by-eck I’m late to this, but I wonder if Notepad++ is the right tool for this job. If you have to do it more than once - a simple automatic text editor like “sed” would be more suitable.
It also resembles work I’ve had to do to copy database data to a format that pastes into Microsoft Excel out of “SQL Server Management Studio”. In that case, I wrote a fairly simple SQL user-defined function, whose output is what I want Excel to receive, starting with changing ‘01’ to =“01” so that Excel displays 01 instead of the number 1 - but also removing in-data line breaks and tabs and anything else that annoyed me. Oh, I think it fixes dates. So the output of the data process didn’t contain those hiccups.
I did that as paid work so I guess my employer owns it, but they may agree to sharing it with the world. Would you like me to ask?