Combine 2 searches

HaPe Krummen

The more I use RegExes, the more I like them … and the more ideas come up.

Is it possible, to search a document for a certain name, store that in a variable ($x) and use it for a replace ($x) in a second search on the same document?

I have many htm documents, where I have an reference number in the first row. This information I’d like to place further down replacing a document name.

Can this be done directly or with a macro?

PeterJones

@HaPe-Krummen ,

Regexes are agnostic of any previous regexes run. And macros don’t have any concept of “state” or “variable”. So the simple answer to your question is “no”.

However, if your document isn’t too big, it is often possible to match the name, store it in a capture group, and do the replacement elsewhere in the same find/replace regular expression pair. Since it sounds like you only want to do one replacement, it’s might be doable with your data (depending on what your data really looks like).

For example,

wanted_name: xyzzy
blah
blah
new location: placeholder

FIND = (?-s)^wanted_name: (.*)$(?s:.*?)new location: \Kplaceholder
REPLACE = $1
SEARCH MODE = Regular Expression
REPLACE ALL

wanted_name: xyzzy
blah
blah
new location: xyzzy

Using this generic pattern, see if you can customize it to match your data rather than my pseudo-data.

(It could get more complicated if the blah blah section is really long, or if you wanted to be able to replace more than one instance of placeholder.)

HaPe Krummen

@PeterJones

thank you for the explanation, Peter. Unfortunately, there are several instances of ‘placeholder’ in the document, so this makes it impossible.

Szenario: I have several thousands xyz.htm documents with corresponding xyz_files directories. These documents were tanslated to German and I need to rename them to put them in the same directory. After the renaming, in the directory I should have one xyz_en.htm, one xyz_de.htm file and only one xyz_files directory that contains the helper files for both htm files. Therefore, in both .htm files the references to the helper files should become ./xyz_files.

I looked for a way to do it in one go, but after your explanation I’ll do it with the help of the macro recorder.

Open file and read the wanted name with Macro Recorder
Start NP++ replace function (CTR-H)
In search a regex to find all instances of the helper_files directory and in replace I let Macro Recorder write the replacement infos.

I have used this, before I found out how to use RegExes within NP++ and it works, but it takes about 14h to go though all the files and macros are not bullet proof. If Windows interferes with something in these 14h, you get unpredictable results …

Mark Olson

@HaPe-Krummen

Unfortunately, there are several instances of ‘placeholder’ in the document, so this makes it impossible.

Sounds like a job for a scripting language. I recommend starting with Python and using its built-in re regex library. Python isn’t great for everything, but it’s great for simple, short scripts that do one or two things.

Another advantage of Python’s re library is that it’s actually many times faster than Notepad++'s Boost regex engine, which can be helpful when you’re processing a large number of files.

EDIT: if you’re using a scripting language, you don’t need to use regular expressions at all, and you could instead use a proper XML parser to avoid potential nasty consequences of using regex to parse HTML.

This forum isn’t the place to ask for further help on this subject, but I imagine you could probably get decent results if you chatbot like ChatGPT for a script that would solve this problem for you. Just remember the golden rules of doing regex/replace operations:

Always search for all the matches to the regex before doing any replacements, so that you know it matches what you expect it to match.
Try testing the find/replace on some toy data. Think of weird edge cases where your regex might fail.
Create a new copy of the files you want to transform, then do the regex-replace on them. If you are using a version control system like Git, you can usually skip this step.
Verify that nothing untoward happened.
Do the find/replace on your actual files.

PeterJones

@HaPe-Krummen said in Combine 2 searches:

Unfortunately, there are several instances of ‘placeholder’ in the document, so this makes it impossible.

You seem to have misunderstood me. It’s not impossible, just more difficult (or possibly just more tedious).

In my example data, if there were multiple instances of placeholder in the document, you could just run Replace All multiple times (with Wrap Around checkmarked) until it says 0 occurrences were replaced in the replace dialog’s status line. If “several” means 20, that’s not bad. If “several” means 200, then doing a macro for the replacement, then Macro > Run a Macro Multiple Times… is probably your best bet.

(Yes, you can do Find In Files across all the files in the macro recorder, so the Replace in Files will go across every file for each instance of the placeholder.)

But, like @Mark-Olson suggested, it’s probably time for you to turn from a text editor to a programming language to accomplish your goal, because they are better suited for doing the same-thing-with-variables across multiple files than a text editor (whose focus is one-file-at-a-time, though it provides tools for bulk search-and-replace that help in a lot of cases; just not perfectly suited to your more-complicated needs). (How to implement such an algorithm in your language-of-choice is, of course, not on topic in a Notepad++ forum.)

Or, even better, you can see if the original HTML might have been generated from a database or similar generating-tool. In which case, tweaking the database’s report format would be the better way of generating it en masse. Because any time you’ve got thousands of documents with similar structures, I’m assuming there was some original creation step for those documents that didn’t involve manually typing them all, and so fixing it at the generation step rather than editing after-the-fact is the right thing to do.

HaPe Krummen

@Mark-Olson @PeterJones

thank you for your help. ChatGPT wrote me a routine that worked with a dozen files on my directory testfiles … now I copy all files to a directory to test it tonight with 25000 files

I’m really surprised how easy this can be.

And I’m reading the code to understand, what the script is doing as I apreciate any help but want to learn :-)