Multi-Line Find and Replace

Dick Adams

I’m a brand new user, just downloaded NPP today. I wanted to give it a test drive with a simple find operation, but can’t get it to work. I want to search for this multi-line string:

<?xml version=“1.0” encoding=“utf-8” ?>\r\n
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.1//EN” “http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd”>\r\n
<html xmlns=“http://www.w3.org/1999/xhtml” lang=“en-us”>

I have a folder selected with six files, and these 3 lines are at the top of each file.The Extended Search Mode box is checked, but the search returns no hits.

Please excuse the newbie question. I tried to find documentation, but spent 15 minutes looking for a user manual with finding one. Is there an online manual that would answer this kind of question? If not, can someone help me out?

I’m also unclear if I’ll receive a notification when when someone replies to this question. Do I have to keep continually checking back on the Web site, or can I get an e-mail notice?

PeterJones

@Dick-Adams, Welcome to Notepad++ and the Community forum.

Taking things out of order:

I’m also unclear if I’ll receive a notification when when someone replies to this question.

This forum is not set up to email you. You have to come back and check. Sorry.

Is there an online manual that would answer this kind of question?

See my boilerplate, below, for some regex help – though it focuses on full regular expressions, not Extended Mode’s limited escape sequences. There isn’t much on the “extended mode” – though the outdated NpWiki++ has a section on it.

test drive with a simple find operation, but can’t get it to work.

That’s unfortunate. Let’s see if we can help

I have a folder selected with six files, and these 3 lines are at the top of each file.

Let’s actually start with just one file. It’s much easier to debug search/replace issues.

I assume that the forum converted your real valid quotes into “pretty” smart quotes (see the boilerplate for how to avoid this), so I am using normal quotes in my examples below.

Assume the following file:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-us">
<body>
<h1>header</h1>
</body>
</html>

(you can copy/paste that into a new notepad++ tab; you don’t even need to save the file to replicate my results, below)

And the following Find what: <?xml version="1.0" encoding="utf-8" ?>\r\n<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">\r\n<html xmlns="http://www.w3.org/1999/xhtml" lang="en-us"> (all as one line in the search box – just copy/paste all the highlighted monospace text into your find what).

When I do that in extended mode, it finds the first three lines, just like I would expect:

With that same file and search string, is your copy of Notepad++ able to find the first three lines?

Also, please be careful of line endings. Your search \r\n implies dos-style CRLF line ending, but many HTML files are written with linux-style LF-only line ending. If you are unsure, you can look at the status bar near the lower right, which will show the line-ending style: if it shows Unix (LF), you will need to use \n instead of \r\n in your extended search pattern. Alternately, View > Show Symbol > Show All Characters will put CR and LF in little black boxes. This latter has the benefit that it would actually show you if you accidentally had extra spaces/tabs at the end of the line before the newline sequence (look for lightly colored dots or arrows indicating spaces or tabs).

-----
FYI: I often add this to my response in regex threads, unless I am sure the original poster has seen it before. Here is some helpful information for finding out more about regular expressions, and for formatting posts in this forum (especially quoting data) so that we can fully understand what you’re trying to ask:

This forum is formatted using Markdown, with a help link buried on the little grey ? in the COMPOSE window/pane when writing your post. For more about how to use Markdown in this forum, please see @Scott-Sumner’s post in the “how to markdown code on this forum” topic, and my updates near the end. It is very important that you use these formatting tips – using single backtick marks around small snippets, and using code-quoting for pasting multiple lines from your example data files – because otherwise, the forum will change normal quotes ("") to curly “smart” quotes (“”), will change hyphens to dashes, will sometimes hide asterisks (or if your text is c:\folder\*.txt, it will show up as c:\folder*.txt, missing the backslash). If you want to clearly communicate your text data to us, you need to properly format it.

If you have further search-and-replace (“matching”, “marking”, “bookmarking”, regular expression, “regex”) needs, study this FAQ and the documentation it points to. Before asking a new regex question, understand that for future requests, many of us will expect you to show what data you have (exactly), what data you want (exactly), what regex you already tried (to show that you’re showing effort), why you thought that regex would work (to prove it wasn’t just something randomly typed), and what data you’re getting with an explanation of why that result is wrong. When you show that effort, you’ll see us bend over backward to get things working for you. If you need help formatting, see the paragraph above.

Please note that for all regex and related queries, it is best if you are explicit about what needs to match, and what shouldn’t match, and have multiple examples of both in your example dataset. Often, what shouldn’t match helps define the regular expression as much or more than what should match.

Dick Adams

Let me explain where I’m headed with this, so you can understand my constraints.

I’m in the middle of a large project to convert a Web site from XHTML to HTML5. There are numerous files (over 10,000), so doing a manual find-and-replace on each one isn’t feasible. In fact, that’s what I was already doing in my IDE, Microsoft Expression Web 4 (MEW4). It worked, but it became quickly obvious it would take far too long to do the entire site. My other reason for not using MEW4 is that garbles some of multi-byte characters during the find-and-replace process (e.g., Unicode character U+1F50A)

So I went looking on the Web for a good text editing tool. I’d heard of NP++ in the past, so I thought I’d check it out first.

There are only 4 separate find & replace operations needed to convert one of my files to HTML 5. Some involve replacing multiple lines with a single line (like the example I started this thread with), others delete pieces of a line, etc.

Ideally, what I’m looking for is a procedure where I can specify a single folder (probably my top level HTML folder), and perform the operations recursively in that folder & all sub-folders. I’d like to leave the original files untouched, so being able to specify the output location would be a plus.

This is a long answer to a short question, and doesn’t begin to address the find-and-replace syntax issue. But I thought it might make my situation easier to understand.

Maybe I’m barking up the wrong tree in the first place? Is NP++ a good choice to do this kind of conversion?

Alan Kilborn

@Dick-Adams said:

Is NP++ a good choice to do this kind of conversion?

Yep.

so doing a manual find-and-replace on each one isn’t feasible

So use Notepad++'s Replace In Files feature (found on the Find window’s Find in Files tab).

what I’m looking for is a procedure where I can specify a single folder (probably my top level HTML folder), and perform the operations recursively in that folder & all sub-folders

Yep, Replace In Files has a In all sub-folders checkbox. It also has a Directory box so you can put your top-level there.

I’d like to leave the original files untouched, so being able to specify the output location would be a plus

The easy but slightly manual solution is to make a copy of the whole tree and only manipulate one of the (now) two copies with Notepad++. This also is conducive to using a file/dir compare utility post-replacement to verify a few files for correct content.

I thought it might make my situation easier to understand.

Yep, we get it. We’ve done it. We’ve lived it. :)

PeterJones

@Dick-Adams said:

There are numerous files (over 10,000),

Great. As I said, “let’s start with”… When solving a problem, it’s a really good idea to simplify it as far as possible, then slowly build it back up to your final solution.

Until you get the replacement working in one file, trying to replace in 10,000 will be difficult (and runs the risk of losing a lot more data if you do something wrong). After you have a working search/replace, then you work on getting it to work in all 10,000 in the Replace In Files.

So, are you able to get the search (or search/replace) working in one single file? If so, great, we can move on; if not, tell us what’s going wrong.

PeterJones

@Alan-Kilborn said:

@Dick-Adams said:

I’d like to leave the original files untouched, so being able to specify the output location would be a plus

The easy but slightly manual solution is to make a copy of the whole tree and only manipulate one of the (now) two copies with Notepad++. This also is conducive to using a file/dir compare utility post-replacement to verify a few files for correct content.

If we’re talking about 10,000 files, real version control seems a must (IMO). Check it into a GIT or SVN repository (or whatever VCS is appropriate to your situation). Once you’ve got a safe history, then start mucking around with the 10k files.

Dick Adams

It’s looking more hopeful now. I finally got it to update all six files in my test folder. I think it was mostly an operator head space & timing issue (getting used to new interface).

I still have some (fairly minor) questions. For example, what does Follow current doc. mean? What is the purpose of the transparency option? etc. Some built in help or tool tips would be *really *useful for new users. (I say this coming from 30 years in software design.)

Alan Kilborn

@Dick-Adams

Follow current doc -> When ticked, when you press Shift+Ctrl+F to invoke Find in Files, it will grab the current document’s directory and shove it in the Directory box. When unticked, whatever you last had (minutes, days, weeks…ago) remains untouched in Directory.

Transparency -> super simple, just a visual setting on the Find window