• Login
Community
  • Login

MultiLine Replace (multiple hosts in hostsfile)

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
12 Posts 4 Posters 4.8k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • R
    rugabunda
    last edited by rugabunda Sep 16, 2018, 2:20 AM Sep 16, 2018, 2:17 AM

    I have a hosts file that is over run with too many entries most of which I don’t want and some which I do. I have another hosts files containing only those hosts I don’t want. How can I take the list of hosts I don’t want, and multi-line remove/replace them with nothing in my original hosts file?

    I have tried toolbucket multifind and replace, but it searches the multi line as one contiguous line, and not each independent line as an independent entry. there should be an option for this. how simple and how important such a feature could be. anyone know how to do this?

    1 Reply Last reply Reply Quote 0
    • R
      rugabunda
      last edited by rugabunda Sep 16, 2018, 2:54 AM Sep 16, 2018, 2:53 AM

      I have tried tool bucket multi-line replace, text crawler, search and replace tool, and it does not work. these multi line search tools combines the exact string of multiple lines together, rather than searching for each line independently. I have 17 thousand unique lines I want to remove from the text, and maybe 150 I want to keep. A SIMPLE option selection box to treat each new line in the find dialogue as an individual search/replace expression would fit the bill and save users massive headaches the world around.

      1 Reply Last reply Reply Quote 0
      • R
        rugabunda
        last edited by rugabunda Sep 16, 2018, 3:33 AM Sep 16, 2018, 3:33 AM

        see I can search for one line, and replace that no problem. but i can’t just as easily insert a multi-line of the entire hosts lists and replace it. should be simple imo.

        S 1 Reply Last reply Sep 16, 2018, 12:56 PM Reply Quote 0
        • G
          guy038
          last edited by guy038 Sep 17, 2018, 9:33 AM Sep 16, 2018, 9:39 AM

          Hi, @rugabunda and All,

          This problem has been already discussed many times ! So :

          • You have a first file, named File_A.txt, which contains a list of… whatever you like !

          • You have a second file, named File_B.txt, which contains the same kind of list, with fewer lines than in File_A.txt

          And you would like to delete, in File_A.txt, all the lines, which are, ALSO, present in File_B.txt

          Wouldn’t you ?


          Not difficult with regular expressions ;-)). Note that the following regex S/R compares complete lines, only !

          Here is the method :

          • Start Notepad++

          • Copy/paste all contents of the main File_A.txt file in a new tab

          • At the end of this main list, add a new line of, at least 3 dashes ( --- )

          • now, append all contents of the File_B.txt file, AFTER this line

          • Open the Replace dialog ( Ctrl + H )

          • Select the Regular expression search mode

          • Tick the Wrap around option

          SEARCH (?-s)^(.+\R)(?=((?s).+?\R)?\1)|(?s)^---.+

          REPLACE Leave EMPTY

          • Click, once on the Replace button or several times on the Replace button

          Et voilà ! In the new N++ tab, any line, which was, both, present in the two files, has been deleted, as well as any possible duplicate lines, in the main File_A.txt file !


          For instance, let’s consider a File_A.txt contents, below :

          Smith Oliver
          Jones Olivia
          TaylorGeorge
          Brown Amelia
          Williams Harry
          Wilson Emily
          Johnson Jack
          Davies Isla
          Davies Isla
          Davies Isla
          Robinson Jacob
          Wright Ava
          Green Lily
          Thompson Noah
          Evans Jessica
          Walker Charlie
          White Isabella
          Roberts Muhammad
          Green Lily
          Hall Thomas
          Wood Ella
          Jackson Oscar
          Clarke Mia
          

          Note that the complete names Davies Isla and Green Lily has, respectively, 2 and 1 duplicates, in that example list !

          Then, let’s suppose the following File_B.txt contents :

          Jackson Oscar
          Green Lily
          Thompson Noah
          Jackson Oscar
          Williams Harry
          

          Again, note the the complete name Jackson Oscar has 1 duplicate

          Once you added the separation line and gathered the contents of the two files, with File_A.txt in first position, you’ll get, in the new tab, the text :

          Smith Oliver
          Jones Olivia
          TaylorGeorge
          Brown Amelia
          Williams Harry
          Wilson Emily
          Johnson Jack
          Davies Isla
          Davies Isla
          Davies Isla
          Robinson Jacob
          Wright Ava
          Green Lily
          Thompson Noah
          Evans Jessica
          Walker Charlie
          White Isabella
          Roberts Muhammad
          Green Lily
          Hall Thomas
          Wood Ella
          Jackson Oscar
          Clarke Mia
          ---
          Jackson Oscar
          Green Lily
          Thompson Noah
          Jackson Oscar
          Williams Harry
          

          After performing the regex S/R :

          SEARCH (?-s)^(.+\R)(?=((?s).+?\R)?\1)|(?s)^---.+

          REPLACE Leave EMPTY

          You’ll obtain the expected list :

          Smith Oliver
          Jones Olivia
          TaylorGeorge
          Brown Amelia
          Wilson Emily
          Johnson Jack
          Davies Isla
          Robinson Jacob
          Wright Ava
          Evans Jessica
          Walker Charlie
          White Isabella
          Roberts Muhammad
          Hall Thomas
          Wood Ella
          Clarke Mia
          

          Notes :

          • As usual, at beginning, the (?-s) modifier means that the dot . character will match any single standard character

          • Then, the part ^(.+\R), matches, from beginning of line, ^ , all the characters, of any non-blank line, .+ , with its line-break characters \R , stored as group 1, due to the parentheses (....)

          • But ONLY IF the regex ((?s).+?\R)?\1, contained in a positive look-ahead structure (?=.......) is verified

          • That is to say, if a second occurrence of the group 1 ( the complete line ), \1 is found, after an optional block, (....)? , of complete lines, (?s).*?\R, which represents the shortest non-null range of any character, .+? ( standard or EOL ones ) due to the (?s) modifier and ending with a line-break, \R

          • Now, when no more duplicates can be found, the regex engine tries the second alternative of the search regex, after the | alternation symbol, that is to say the regex (?s)^---.+, which grabs the longest range of any character, from the separation line till the very end of the file, due to the (?s) modifier which means that dot . matches any single character ( standard and EOL one )

          • In replacement, all the lines or block matched are, simply, deleted, as the replacement zone is empty

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 1
          • S
            Scott Sumner @rugabunda
            last edited by Sep 16, 2018, 12:56 PM

            @rugabunda

            I’m concerned about you throwing about the terminology “multi line” without really explaining what that means to your task at hand. Sure, it is 100% clear to you, but to people that might try to help you…it may lead to a misunderstanding. Sure if @guy038 has giving you enough info to move forward, go for it, but if not, how about showing some sample before and desired/after data to elicit further help?

            1 Reply Last reply Reply Quote 1
            • R
              rugabunda
              last edited by Sep 16, 2018, 10:31 PM

              Thank you @guy038, IT WORKED! you figured it out like a pro right off the bat. Your expressions worked. Thank you for your detailed and able assistance!

              As for what I stated about simplified multi line searching, please read it again with his correct interpretation in context. I would like to perform the same operation without having to use a single expression, it should be as easy as copy/paste click replace, done. Were there a box one can check that states “treat each line as independent search” or “line-by-line search” something to that effect. So that each contiguous line in the search dialogue box performed the search and replace independently, one after another. As it stands I can search for one line, and replace that no problem. but i can’t just as easily insert a multi-line of the entire hosts lists and hit replace all, with a tick box that treats each individual line as an independent search. should be simple imo.

              Thank you once again! You are awesome guy038!

              S 1 Reply Last reply Sep 17, 2018, 1:11 PM Reply Quote 0
              • S
                Scott Sumner @rugabunda
                last edited by Sep 17, 2018, 1:11 PM

                @rugabunda

                I still think the terminology used is bad. I’d refer to this generically as a “list-based” replacement or a “lookup-table” replacement. “Multiline” means something totally different, at least to me…

                1 Reply Last reply Reply Quote 0
                • dinkumoilD
                  dinkumoil
                  last edited by Sep 17, 2018, 1:43 PM

                  @rugabunda

                  A generic workaround would be the following Windows console command:

                  (for /f "usebackq delims=" %a in ("File_A.txt") do @(findstr /x /c:"%a" "File_B.txt" 1>NUL 2>NUL || echo %a)) > "File_C.txt"
                  

                  This would produce a file File_C.txt containing all lines of File_A.txt which are not part of File_B.txt.

                  Please note:

                  • The command above is designed for direct execution from a console window. If you want to put it in a batch file you have to double the %-sign before the variable %a.

                  • If you want to search case-insensitive add the switch /i to the findstrcommand.

                  1 Reply Last reply Reply Quote 3
                  • G
                    guy038
                    last edited by guy038 Sep 17, 2018, 7:44 PM Sep 17, 2018, 7:34 PM

                    Hi, @rugabunda, @scott-sumner, @dinkumoil and All,

                    @dinkumoil :

                    Very clever use of the two DOS commands for and findstr !

                    It’s still possible to shorten a bit this command line, as :

                    • The filesnames, without extension, has less than 8 characters

                    • The filenames do not contain any space character

                    => Then, the option usebackq, of command for, is not necessary and double-quotes, surrounding filenames, are useless, too !

                    • Secondly, if we suppose that File_C.txt does not exist in current directory, we can omit the outer commands grouping (.....) and use the DOS append to file symbol >>

                    • Thirdly, it better to suppress the error redirection to the NUL file 2>NUL because, if File_B.txt does not exist in current folder, you’ll see the error message on the console, as well as File_A.txt

                    So, if that command line is not part of batch file, the final syntax could be :

                    for /f "delims=" %L in (File_A.txt) do @(findstr /x /c:"%L" File_B.txt 1>NUL || echo %L) >> File_C.txt
                    

                    Note that I changed the %a variable, storing each line of File_A.txt, with the %L syntax (for L ine )

                    @rugabunda :

                    I forgot to speak about case sensitivity ! So, the regex search should be :

                    • SEARCH (?i-s)^(.+\R)(?=((?s).+?\R)?\1)|(?s)^---.+ , if you need a case-insensitive search

                    • SEARCH (?-is)^(.+\R)(?=((?s).+?\R)?\1)|(?s)^---.+ , if you need a case-sensitive search

                    Regards,

                    guy038

                    P.S. :

                    @dinkumoil :

                    Your “DOS” method does not delete any duplicate of File_A.txt, which is not in File_B.txt, of course. Anyway, it’s just a “side-effect” of my regex ;-))

                    1 Reply Last reply Reply Quote 1
                    • dinkumoilD
                      dinkumoil
                      last edited by Sep 17, 2018, 9:29 PM

                      @guy038

                      Batchscript is a mine field and thus over the years I got used to some best practices to avoid certain problems.

                      One of these problems are blanks in file names, so I always surround file names with double quotes. Maybe the script gets changed in the future by an unexperienced user and he uses file names with blanks, with my double-quote-policy no problem.

                      The command grouping with (...) around the FOR loop is intentionally. It gives the loop a big performance boost in case the script is used to write a lot of lines because the output file has to be opened only for one time. @rugabunda talks of 17000 lines he wants to process, so it’s a pretty good idea to keep an eye on performance. Furthermore your solution with appending the script’s output to the resulting file leads to the necessity to clear the output file before every script run.

                      1 Reply Last reply Reply Quote 3
                      • dinkumoilD
                        dinkumoil
                        last edited by Sep 17, 2018, 9:42 PM

                        @guy038

                        Addendum: Because of my last point in the posting above it is also neccessary to redirect error messages of FINDSTR to the NUL device to avoid that they are written to the output file. I also think that they are useless in the context of the problem discussed here.

                        In summary, my command has not much potential to be optimised. ;)

                        1 Reply Last reply Reply Quote 3
                        • G
                          guy038
                          last edited by guy038 Sep 17, 2018, 10:31 PM Sep 17, 2018, 10:29 PM

                          Hi, @rugabunda, @scott-sumner, @dinkumoil and All,

                          Ah, many thanks, Dinkumoil, for your optimization advices !

                          Of course, having 17,000 messages “impossible to open File-B.txt” is rather idiot ! I did not realize this, as working on my tiny File_A.txt :-(

                          Thus, appending 17,000 times, a line to File_C.txt is not very efficient, too, as you said !

                          Finally :

                          • If you previously chose some basic filenames, without spaces, this syntax should be enough :
                          (for /f "delims=" %L in (File_A.txt) do @(findstr /x /c:"%L" File_B.txt 1>NUL 2>NUL || echo %L)) > File_C.txt
                          
                          • If your filenames may be long, with some space characters, the @dinkumoil solution, with the usebackq option and filenames surrounded by double quotes, is safer !

                          Cheers,

                          guy038

                          1 Reply Last reply Reply Quote 3
                          1 out of 12
                          • First post
                            1/12
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors