MultiLine Replace (multiple hosts in hostsfile)
rugabunda last edited by rugabunda
see I can search for one line, and replace that no problem. but i can’t just as easily insert a multi-line of the entire hosts lists and replace it. should be simple imo.
Hi, @rugabunda and All,
This problem has been already discussed many times ! So :
You have a first file, named
File_A.txt, which contains a list of… whatever you like !
You have a second file, named
File_B.txt, which contains the same kind of list, with fewer lines than in
And you would like to delete, in
File_A.txt, all the lines, which are, ALSO, present in
Wouldn’t you ?
Not difficult with regular expressions ;-)). Note that the following regex S/R compares complete lines, only !
Here is the method :
Copy/paste all contents of the main
File_A.txtfile in a new tab
At the end of this main list, add a new line of, at least
now, append all contents of the
File_B.txtfile, AFTER this line
Ctrl + H)
Regular expressionsearch mode
- Click, once on the
Replacebutton or several times on the
Et voilà ! In the new N++ tab, any line, which was, both, present in the two files, has been deleted, as well as any possible duplicate lines, in the main
For instance, let’s consider a
File_A.txtcontents, below :
Smith Oliver Jones Olivia TaylorGeorge Brown Amelia Williams Harry Wilson Emily Johnson Jack Davies Isla Davies Isla Davies Isla Robinson Jacob Wright Ava Green Lily Thompson Noah Evans Jessica Walker Charlie White Isabella Roberts Muhammad Green Lily Hall Thomas Wood Ella Jackson Oscar Clarke Mia
Note that the complete names Davies Isla and Green Lily has, respectively,
1duplicates, in that example list !
Then, let’s suppose the following
Jackson Oscar Green Lily Thompson Noah Jackson Oscar Williams Harry
Again, note the the complete name Jackson Oscar has
Once you added the separation line and gathered the contents of the two files, with
File_A.txtin first position, you’ll get, in the new tab, the text :
Smith Oliver Jones Olivia TaylorGeorge Brown Amelia Williams Harry Wilson Emily Johnson Jack Davies Isla Davies Isla Davies Isla Robinson Jacob Wright Ava Green Lily Thompson Noah Evans Jessica Walker Charlie White Isabella Roberts Muhammad Green Lily Hall Thomas Wood Ella Jackson Oscar Clarke Mia --- Jackson Oscar Green Lily Thompson Noah Jackson Oscar Williams Harry
After performing the regex S/R :
You’ll obtain the expected list :
Smith Oliver Jones Olivia TaylorGeorge Brown Amelia Wilson Emily Johnson Jack Davies Isla Robinson Jacob Wright Ava Evans Jessica Walker Charlie White Isabella Roberts Muhammad Hall Thomas Wood Ella Clarke Mia
As usual, at beginning, the
(?-s)modifier means that the dot
.character will match any single standard character
Then, the part
^(.+\R), matches, from beginning of line,
^, all the characters, of any non-blank line,
.+, with its line-break characters
\R, stored as group
1, due to the parentheses
But ONLY IF the regex
((?s).+?\R)?\1, contained in a positive look-ahead structure
That is to say, if a second occurrence of the group
1( the complete line ),
\1is found, after an optional block,
(....)?, of complete lines,
(?s).*?\R, which represents the shortest non-null range of any character,
.+?( standard or EOL ones ) due to the
(?s)modifier and ending with a line-break,
Now, when no more duplicates can be found, the regex engine tries the second alternative of the search regex, after the
|alternation symbol, that is to say the regex
(?s)^---.+, which grabs the longest range of any character, from the separation line till the very end of the file, due to the
(?s)modifier which means that dot
.matches any single character ( standard and EOL one )
In replacement, all the lines or block matched are, simply, deleted, as the replacement zone is empty
Scott Sumner last edited by
I’m concerned about you throwing about the terminology “multi line” without really explaining what that means to your task at hand. Sure, it is 100% clear to you, but to people that might try to help you…it may lead to a misunderstanding. Sure if @guy038 has giving you enough info to move forward, go for it, but if not, how about showing some sample before and desired/after data to elicit further help?
rugabunda last edited by
Thank you @guy038, IT WORKED! you figured it out like a pro right off the bat. Your expressions worked. Thank you for your detailed and able assistance!
As for what I stated about simplified multi line searching, please read it again with his correct interpretation in context. I would like to perform the same operation without having to use a single expression, it should be as easy as copy/paste click replace, done. Were there a box one can check that states “treat each line as independent search” or “line-by-line search” something to that effect. So that each contiguous line in the search dialogue box performed the search and replace independently, one after another. As it stands I can search for one line, and replace that no problem. but i can’t just as easily insert a multi-line of the entire hosts lists and hit replace all, with a tick box that treats each individual line as an independent search. should be simple imo.
Thank you once again! You are awesome guy038!
Scott Sumner last edited by
I still think the terminology used is bad. I’d refer to this generically as a “list-based” replacement or a “lookup-table” replacement. “Multiline” means something totally different, at least to me…
A generic workaround would be the following Windows console command:
(for /f "usebackq delims=" %a in ("File_A.txt") do @(findstr /x /c:"%a" "File_B.txt" 1>NUL 2>NUL || echo %a)) > "File_C.txt"
This would produce a file
File_C.txtcontaining all lines of
File_A.txtwhich are not part of
The command above is designed for direct execution from a console window. If you want to put it in a batch file you have to double the
%-sign before the variable
If you want to search case-insensitive add the switch
Very clever use of the two DOS commands
It’s still possible to shorten a bit this command line, as :
The filesnames, without extension, has less than
The filenames do not contain any space character
=> Then, the option
usebackq, of command
for, is not necessary and double-quotes, surrounding filenames, are useless, too !
Secondly, if we suppose that
File_C.txtdoes not exist in current directory, we can omit the outer commands grouping
(.....)and use the DOS append to file symbol
Thirdly, it better to suppress the error redirection to the NUL file
File_B.txtdoes not exist in current folder, you’ll see the error message on the console, as well as
So, if that command line is not part of batch file, the final syntax could be :
for /f "delims=" %L in (File_A.txt) do @(findstr /x /c:"%L" File_B.txt 1>NUL || echo %L) >> File_C.txt
Note that I changed the
%avariable, storing each line of
File_A.txt, with the
%Lsyntax (for L ine )
I forgot to speak about case sensitivity ! So, the regex search should be :
(?i-s)^(.+\R)(?=((?s).+?\R)?\1)|(?s)^---.+, if you need a case-insensitive search
(?-is)^(.+\R)(?=((?s).+?\R)?\1)|(?s)^---.+, if you need a case-sensitive search
Your “DOS” method does not delete any duplicate of
File_A.txt, which is not in
File_B.txt, of course. Anyway, it’s just a “side-effect” of my regex ;-))
Batchscript is a mine field and thus over the years I got used to some best practices to avoid certain problems.
One of these problems are blanks in file names, so I always surround file names with double quotes. Maybe the script gets changed in the future by an unexperienced user and he uses file names with blanks, with my double-quote-policy no problem.
The command grouping with
FORloop is intentionally. It gives the loop a big performance boost in case the script is used to write a lot of lines because the output file has to be opened only for one time. @rugabunda talks of 17000 lines he wants to process, so it’s a pretty good idea to keep an eye on performance. Furthermore your solution with appending the script’s output to the resulting file leads to the necessity to clear the output file before every script run.
Addendum: Because of my last point in the posting above it is also neccessary to redirect error messages of
FINDSTRto the NUL device to avoid that they are written to the output file. I also think that they are useless in the context of the problem discussed here.
In summary, my command has not much potential to be optimised. ;)
Ah, many thanks, Dinkumoil, for your optimization advices !
Of course, having
17,000messages “impossible to open File-B.txt” is rather idiot ! I did not realize this, as working on my tiny
17,000times, a line to
File_C.txtis not very efficient, too, as you said !
- If you previously chose some basic filenames, without spaces, this syntax should be enough :
(for /f "delims=" %L in (File_A.txt) do @(findstr /x /c:"%L" File_B.txt 1>NUL 2>NUL || echo %L)) > File_C.txt
- If your filenames may be long, with some space characters, the @dinkumoil solution, with the usebackq option and filenames surrounded by double quotes, is safer !