Massive list and massive search and replace?
-
I have multiple text files (over 100)
and I need to do multiple replace text (a lot)
example
- james - James
- calvin - Calvin
- new york - New York
is it possible to upload a list of text needs replacement (maybe Excel?) then search and replace?
-
There really isn’t a native-to-Notepad++ way of doing this.
If you’re willing to use a scripting plugin, though, we can “get 'er done” – how about it? Are you willing to go to the “complication” of setting up scripting? -
@Alan-Kilborn I dont mind a little scripting. But I didnt know Notepad++ can do Scripting
-
@Calvin-Foo said in Massive list and massive search and replace?:
I didnt know Notepad++ can do Scripting
Yes, here’s a good starting point REFERENCE.
If you can give me a little time, I’ll put together a demo in my upcoming spare time. We have some scripts here on this site for “replacing from a list” but I don’t think we have anything that operates over a folder tree of files. I could pull together something that does both.
-
@Alan-Kilborn it will be great if you can help me on this. TQVM
-
Ok, so I found some time to finish the script.
Here’s how it works:
You start with a file open and active in Notepad++ that contains your desired replacements. This file should be saved into the same folder in which you want to do the replacements. The file can have any name and needs to have the following format for your replacements list:
blue->orange replace this->with this the delimiter is->a minus followed by a greater than
At that point, you run the script. It will prompt you through a series of questions about the operation, giving you a chance to validate that you are doing what you intend at several steps:
Somewhat obviously, after you give the final “Yes” the real work will actually be done and the indicated replacements made.
I call the script
ReplaceInFilesFromListInActiveTab.py
and here is its listing:# -*- coding: utf-8 -*- from __future__ import print_function # references: # https://community.notepad-plus-plus.org/topic/23638/massive-list-and-massive-search-and-replace # also possibly https://community.notepad-plus-plus.org/topic/22601 # also possibly https://community.notepad-plus-plus.org/topic/22721 # also possibly https://community.notepad-plus-plus.org/topic/23495 from Npp import * import inspect import os import re import glob #------------------------------------------------------------------------------- class RIFFLIAT(object): def __init__(self): self.debug = True if 0 else False if self.debug: console.show() console.clear() self.this_script_name = inspect.getframeinfo(inspect.currentframe()).filename.split(os.sep)[-1].rsplit('.', 1)[0] # the active tab has the list of the substitution pairs substitutions_list_file_path = notepad.getCurrentFilename() if not os.path.isfile(substitutions_list_file_path): self.mb('Substitution list file must be a hard-named file in the file system, i.e., not e.g. "new 2"') return self.print('substitutions_list_file_path:', substitutions_list_file_path) find_and_repl_match_list = [] delimiter = '->' editor.research(r'(?-s)^(.+?)' + delimiter + r'(.+)', lambda m: find_and_repl_match_list.append((m.group(1), m.group(2)))) if len(find_and_repl_match_list) == 0: self.mb('\r\n'.join([ 'The substitution list in the active file has no findwhat/replacewith pairs\r\n', 'Format of file is, 1 pair per line, using {d} as a delimiter, no extra spaces:\r\n'.format(d=delimiter), 'find1{d}replace1'.format(d=delimiter), 'find2{d}replace2'.format(d=delimiter), '...{d}...'.format(d=delimiter), ])) return sample_repl_pairs_summary_list = [] three_or_less_sub_pairs = min(3, len(find_and_repl_match_list)) num_sub_pairs_above_3 = len(find_and_repl_match_list) - three_or_less_sub_pairs max_chars_show = 20 for (find_what, replace_with) in find_and_repl_match_list[ 0 : three_or_less_sub_pairs ]: if len(find_what) > max_chars_show: find_what = find_what[ 0 : max_chars_show] + '...' if len(replace_with) > max_chars_show: replace_with = replace_with[ 0 : max_chars_show] + '...' sample_repl_pairs_summary_list.append('"{fw}" with "{rw}"'.format(fw=find_what, rw=replace_with)) if num_sub_pairs_above_3 > 0: sample_repl_pairs_summary_list.append('(and {} more)'.format(num_sub_pairs_above_3)) search_folder_top_level_path = substitutions_list_file_path.rsplit(os.sep, 1)[0] + os.sep self.print('search_folder_top_level_path:', search_folder_top_level_path) if not self.yes_no('\r\n\r\n'.join([ 'Q1 of 4:\r\n', 'Perform these replacements (specified in the active file content):', '\r\n'.join(sample_repl_pairs_summary_list) + '\r\n', 'in the files in this folder?', search_folder_top_level_path, '-' * 60, 'IT IS STRONGLY SUGGESTED TO MAKE A BACKUP', 'OF ALL SOURCE FILES BEFORE RUNNING THIS!', ])): return process_subfolders = self.yes_no_cancel('\r\n\r\n'.join([ 'Q2 of 4:\r\n', 'Do replacements in files in SUBFOLDERS of this folder also?', search_folder_top_level_path, ])) if process_subfolders == None: return # user cancel self.print('process_subfolders:', process_subfolders) default_filespec = '*.txt' filter_input = self.prompt( 'Q3 of 4:\r\n' + \ 'Supply filespec filter list (example: *.html *.txt *.log )', default_filespec) if filter_input == None: return # user cancel filters_list = filter_input.split(' ') filters_list = [ f for f in filters_list if len(f) > 0 ] # remove any empty entries in filters_list self.print('filters_list:', filters_list) pathnames_of_files_to_replace_in_list = [] for (root, dirs, files) in os.walk(search_folder_top_level_path): for filt in filters_list: for p in glob.glob(os.path.join(root, filt)): if p != substitutions_list_file_path: pathnames_of_files_to_replace_in_list.append(p) if not process_subfolders: break if len(pathnames_of_files_to_replace_in_list) == 0: self.mb('No files matched specified filter(s)') return num_files_to_examine = len(pathnames_of_files_to_replace_in_list) if not self.yes_no('\r\n\r\n'.join([ 'Q4 of 4:\r\n', '---- FINAL CONFIRM ----\r\n', 'Make replacements in {nfe} candidate files in this folder{b} ?'.format( nfe=num_files_to_examine, b=' AND below' if process_subfolders else '\r\n(but not its subfolders)'), search_folder_top_level_path, ])): return pathname_currently_open_in_a_tab_list = [] for (pathname, buffer_id, index, view) in notepad.getFiles(): if pathname not in pathname_currently_open_in_a_tab_list: pathname_currently_open_in_a_tab_list.append(pathname) num_repl_made_in_all_files = 0 pathnames_with_content_changed_by_repl_list = [] for pathname in pathnames_of_files_to_replace_in_list: if pathname in pathname_currently_open_in_a_tab_list: self.print('switching active tab to', pathname) notepad.activateFile(pathname) editor.beginUndoAction() else: self.print('opening', pathname) notepad.open(pathname) if notepad.getCurrentFilename() != pathname: continue # shouldn't happen for (find_what, replace_with) in find_and_repl_match_list: # since the editor.replace() function won't tell us how many replacements it made, # count them by searching for the matches BEFORE doing the replacement self.num_repl_made_in_this_file = 0 def match_found(m): self.num_repl_made_in_this_file += 1 editor.search(find_what, match_found) if self.num_repl_made_in_this_file > 0: self.print('replacing "{fw}" with "{rw}" {n} times'.format( fw=find_what, rw=replace_with, n=self.num_repl_made_in_this_file)) num_repl_made_in_all_files += self.num_repl_made_in_this_file if pathname not in pathnames_with_content_changed_by_repl_list: pathnames_with_content_changed_by_repl_list.append(pathname) # FINALLY, the actual replacement! editor.replace(find_what, replace_with) if pathname in pathname_currently_open_in_a_tab_list: editor.endUndoAction() else: if editor.getModify(): self.print('saving', pathname) notepad.save() self.print('closing', pathname) notepad.close() # restore tab that was active before we started notepad.activateFile(substitutions_list_file_path) self.mb('\r\n\r\n'.join([ '---- DONE! ----', '{nr} total replacements made in {nrf} files'.format(nr=num_repl_made_in_all_files, nrf=len(pathnames_with_content_changed_by_repl_list)), '(of {nfe} files matching filters provided)'.format(nfe=num_files_to_examine), ])) def print(self, *args): if self.debug: print('RIFFLIAT:', *args) def mb(self, msg, flags=0, title=''): # a message-box function return notepad.messageBox(msg, title if title else self.this_script_name, flags) def yes_no(self, question_text): retval = False answer = self.mb(question_text, MESSAGEBOXFLAGS.YESNO, self.this_script_name) return True if answer == MESSAGEBOXFLAGS.RESULTYES else False def yes_no_cancel(self, question_text): retval = None answer = self.mb(question_text, MESSAGEBOXFLAGS.YESNOCANCEL, self.this_script_name) if answer == MESSAGEBOXFLAGS.RESULTYES: retval = True elif answer == MESSAGEBOXFLAGS.RESULTNO: retval = False return retval def prompt(self, prompt_text, default_text=''): if '\n' not in prompt_text: prompt_text = '\r\n' + prompt_text prompt_text += ':' return notepad.prompt(prompt_text, self.this_script_name, default_text) #------------------------------------------------------------------------------- if __name__ == '__main__': RIFFLIAT()
For basic information about setting up scripting, see the REFERENCE I provided in an earlier post in this thread.
(And BTW, thanks to @PeterJones for some prerelease testing on this!)
-
-
@Alan-Kilborn Thanks, but wow, this seems a bit too complicated for me.
Is it possible to simplify it just to read an Excel page, find A2 replace it with B2 for every text files are opened in NPP?I can just add new words in the excel file, then I just run the script
-
Most of the complication is just setting up Python Script plugin and installing the script once.
After that, you just have to run the script when you have a file
blue->orange replace this->with this the delimiter is->a minus followed by a greater than
It’s really not hard to create that substitution file.
And having the replacement data in Excel would make it harder for Alan to write a script for you (and this forum is not a code-writing service), but because the script would still be written in PythonScript, you would still have to install PythonScript and install that script once. Running it would be just as easy for you whether you run it with a text file as the source of the search->replace pairs or whether you run a script that has to parse some external Excel spreadsheet (easier, actually, for the text file, because then you don’t have to also run Excel just to prepare for a search-and-replace in Notepad++). He would still have to have all those confirmation dialogs whether the map is in Excel or in Notepad++… and he’d also have to have another dialog which asks you where the Excel spreadsheet was.
He wrote this not just for you, but also for all the other people who ask for nearly the same functionality (we’ve seen similar requests a lot over the years, and he finally decided that we needed one generic script to handle them all, so that we’d stop having to write customized scripts for each user). If using this generic script is too complicated for you, you will not like any implementation that anyone here is able to give you.
Good luck.
-
@Calvin-Foo said in Massive list and massive search and replace?:
Is it possible to simplify it just to read an Excel page, find A2 replace it with B2 for every text files are opened in NPP?
If you are referring to an Excel file with extension XLSX then no, Notepad++ does NOT read files which are binary in nature very well. It is after all a TEXT editor, not a Binary editor.
However since you refer to an “Excel page” and refer to words in 2 columns, that could also be a CSV (comma separated value) file. And whilst @Alan-Kilborn has used a TSV (tab separated value) file, the 2 are very similar. He possibly could alter his code to use the comma instead of a tab, but possibly used the tab to prevent possible confusion within words.
But doing that minor change to his code isn’t going to simplify the process anyways. Just be thankful he has gone to such lengths to help you. Sometimes doing processes such as you outlined can be made easier, but will still require time to setup.
Terry
PS I see @PeterJones has stated the same.
-
@Terry-R said in Massive list and massive search and replace?:
And whilst @Alan-Kilborn has used a TSV (tab separated value) file
Actually it isn’t a “tab”, although I can see why you’d think that. I chose
-
followed by>
as the delimiter. The delimiter is specifically variable-ized in the script, so one could easily change it to whatever is desired. -
@Calvin-Foo said in Massive list and massive search and replace?:
Is it possible to simplify it just to read an Excel page, find A2 replace it with B2 for every text files are opened in NPP?
I suppose it IS possible, but not by me.
The intent of the script was to solve kind of a general case problem, in a general way.
Of course anyone can treat it as a demo, and feel free to modify it at will. -
I guess I need to learn from ground up. I only have experience in writing ASP 3.0. and SQL Server.
I guess I just start from there
Maybe anyone can give me a simple guide on How to write a simple replace text script? I’ll further study from there and include a list of text (maybe import from CSV)
-
There’s very, very, very little to learn. First, follow the instructions in Alan’s second post to you (“REFERENCE” link) LIKE A MONKEY.
Here’s a first script you can run:
#! python import sys print "Old style print syntax" sys.stdout.write("Calvin-Foo's first script -- hello from Python %s\n" % (sys.version,))
You don’t need to know what any of the lines mean or what they do. (Once you get it running, you can hack at it for fun).
There’s a lot complexity in airplanes and elevators and keyboards and phones that you don’t see and don’t need to deal with. It’s really quite similar.
-
@Calvin-Foo said in Massive list and massive search and replace?:
I guess I need to learn from ground up.
I guess I just start from thereI’ll further study from there and include a list of text (maybe import from CSV)In case it isn’t obvious, you could take YOUR data, in whatever (textual) format, and use Notepad++ to change it with a replacement operation into MY demo format, and then just run with the demo solution.
This avoids programming and could (probably) be made into a N++ macro for easy repetitive running.
As an example, take your original problem statement data (I know it isn’t your real data, but we have none of that here, so…):
1. james - James 2. calvin - Calvin 3. new york - New York
You could change that into the needed input format for the script by this operation:
Find:
(?-s)^\d+\. (.+?) - (.+)
Replace:${1}->${2}
Search mode:Regular expression
Wrap around: Checked
Action:Replace All
buttonAfter the replace-all, your data would then look like this:
james->James calvin->Calvin new york->New York
which would be a direct feed-in to the demo script.
this seems a bit too complicated for me
Of course, this data transform may involve learning some regular expressions (don’t know your expertise), but I don’t think it is too much to ask people that request a moderately-complex solution to a problem to do some sort of learning of their own along the way.
How to write a simple replace text script?
Well about the simplest one I can think of would be a one-liner:
editor.replace('apple', 'Apple')
-
-
-
-
-
this script is FANTASTIC.
i do have a question though- in the final output it prints how many replacements were made, but is there any way to see what the actual replacements were?
looking into the code i see
“”" if self.num_repl_made_in_this_file > 0:
self.print('replacing "{fw}" with "{rw}" {n} times'.format( fw=find_what, rw=replace_with, n=self.num_repl_made_in_this_file)) """
but i dont see where that would get printed- it doesnt show up in the python console either
again huge thanks for this one
-
@nerdyone255 said :
this script is FANTASTIC.
Well…glad you like it.
but i dont see where that would get printed- it doesnt show up in the python console either
The
self.print()
function calls are really meant as debug helpers while testing the script. Thus, in the version of the script above, they don’t do anything because thedebug
variable is set to False. If you change the0
to a1
in this line:self.debug = True if 0 else False
or simply change it to:
self.debug = True
then the output of the self.print() calls will go to the PythonScript console window. You’ll see the output you indicated you were interested, plus output from other things that happen while the script is running.
-
@Alan-Kilborn perfect!
-