Copy and paste sections with RegEx or macro?
-
Hello Community.
I have a problem and hope someone can help me.
I have a txt file with questions and answers
Example TXT file:
##xx## ☐ Question1: L123orem i>psum dolor ☐ Question2: sit a>sdfs>m213et, ☑ Question3: con><<set>easdtur sadipscing ##xy## ##xx## ☐ Question1: elifdf343tr, 1sed 23diam nonumyfdg ☐ Question2: 32423df34 ☑ Question3: eirmd45345ocxfdvcxvd tem123por ☑ Question4: 543534 in34vifgdunt ut labore ##xy##
and I will make this (for import - csv):
" ☐ Question1: L123orem i>psum dolor ☐ Question2: sit a>sdfs>m213et, ☐ Question3: con><<set>easdtur sadipscing ";" ☐ Question1: L123orem i>psum dolor ☐ Question2: sit a>sdfs>m213et, ☑ Question3: con><<set>easdtur sadipscing " " ☐ Question1: elifdf343tr, 1sed 23diam nonumyfdg ☐ Question2: 32423df34 ☐ Question3: eirmd45345ocxfdvcxvd tem123por ☐ Question4: 543534 in34vifgdunt ut labore ";" ☐ Question1: elifdf343tr, 1sed 23diam nonumyfdg ☐ Question2: 32423df34 ☑ Question3: eirmd45345ocxfdvcxvd tem123por ☑ Question4: 543534 in34vifgdunt ut labore "
That signals is the section:
##xx##
…
##xy##The content in it can contain all kinds of characters and the amount of lines are different.
The first position / sign in line is ☐ or ☑
When copying, ☑ must be changed to this ☐ for the first contain.How can I this be implemented with Notepad++? I’ve tried using a macro, cut and paste with a new TAB window, but that doesn’t right work. Does anyone have an idea with RegEx.
I would appreciate some help. -
I’ve tried using a macro, cut and paste with a new TAB window, but that doesn’t right work.
I beg to differ.
I just recorded a macro with the following sequence:
- Select All
- Copy
- File > New
- Paste (into new tab)
- Search > Replace: FIND =
☑
, REPLACE =☐
, SEARCH MODE = Regular Expressoin => REPLACE ALL
I then saved that macro. I restarted Notepad++, pasted in your example “before” text, and ran the macro: I ended up with a second file that had all the ☑ converted to ☐.
As any regex search-and-replace can be recorded in a macro, I could have made a more complicated regex to accomplish your exact goal. But while I saw that you wanted to change some of the ☑ to ☐, others were not converted, and based on your description, I could not tell which ones should or should not be converted.
If you need help refining the regex, you will need to provide a better description of when the ☑ should and should not be converted to ☐.
----
Useful References
-
This looks like it can be solved in two regex phases.
In the first phase, we would make exact copies of the text between record delimiters, while also altering the record delimiters.
RecStart is ##xx## on a line, RecEnd is ##xy## on a line
We match: <RecStart><all in-between text, captured to capture group 1, CG1><RecEnd>
Replace with: <" on a line><CG1><“;” on a line><CG1><" on a line>
In the second phase, match all checked box characters in the second copy, and replace with unchecked box characters.
Each record’s second copy is now bound by these record delimiters:
RecStart is “;” on a line, RecEnd is " on a line
The first phase is a fairly straightforward regex (I couldn’t tell from your post if you have enough experience to do this yourself).
The second phase is complex, but has been solved in this community many times. If you search the community for posts with text “FR BR BSR ESR” you will find many examples, such as this one: https://community.notepad-plus-plus.org/topic/22667/how-to-delete-but-only-in-certain-lines
Once you tune the two regexes, you can create a macro that performs both as a single action. You would do this in the working file tab, and then you are free to “save as…” or select-copy-paste to new, undo (ctl-z), and so on. If you wish you could also have the macro copy all and paste into a new file tab as Peter showed.
@peterjones, My guess is that they got in trouble doing copy and paste actions some of which don’t work as one would hope in macros.
-
Thanks PeterJones and Neil Schipper for your help.
I’ll try again and hope it works.
-
Hello, @venus642, @peterjones, @neil-schipper and All,
As the other posters, I propose a resolution in two regex phases !
So, from your INPUT text :
##xx## ☐ Question1: L123orem i>psum dolor ☐ Question2: sit a>sdfs>m213et, ☑ Question3: con><<set>easdtur sadipscing ##xy## ##xx## ☐ Question1: elifdf343tr, 1sed 23diam nonumyfdg ☐ Question2: 32423df34 ☑ Question3: eirmd45345ocxfdvcxvd tem123por ☑ Question4: 543534 in34vifgdunt ut labore ##xy##
Use this first regex S/R :
-
SEARCH
(?s-i)^##xx##\R(.+?)^##xy##\R
-
REPLACE
"\r\n\1";"\r\n\1"\r\n
Which :
- Duplicates the questions block
- Changes ##xx## and ##xy## into a single
"
char - Separates each block from its duplicate one with a line
";"
So, you will get this temporary result :
" ☐ Question1: L123orem i>psum dolor ☐ Question2: sit a>sdfs>m213et, ☑ Question3: con><<set>easdtur sadipscing ";" ☐ Question1: L123orem i>psum dolor ☐ Question2: sit a>sdfs>m213et, ☑ Question3: con><<set>easdtur sadipscing " " ☐ Question1: elifdf343tr, 1sed 23diam nonumyfdg ☐ Question2: 32423df34 ☑ Question3: eirmd45345ocxfdvcxvd tem123por ☑ Question4: 543534 in34vifgdunt ut labore ";" ☐ Question1: elifdf343tr, 1sed 23diam nonumyfdg ☐ Question2: 32423df34 ☑ Question3: eirmd45345ocxfdvcxvd tem123por ☑ Question4: 543534 in34vifgdunt ut labore "
Then, use this second regex S/R :
-
SEARCH
☑(?=(?s:(?!").)+";")
-
REPLACE
☐
which :
-
Replaces any
☑
character with☐
ONLY IF it followed, further on, with the string";"
-
And, also, IF any character, between the
☑
char and the string";"
, is different from a double-quote ("
)
So, here is your expected OUTPUT text :
" ☐ Question1: L123orem i>psum dolor ☐ Question2: sit a>sdfs>m213et, ☐ Question3: con><<set>easdtur sadipscing ";" ☐ Question1: L123orem i>psum dolor ☐ Question2: sit a>sdfs>m213et, ☑ Question3: con><<set>easdtur sadipscing " " ☐ Question1: elifdf343tr, 1sed 23diam nonumyfdg ☐ Question2: 32423df34 ☐ Question3: eirmd45345ocxfdvcxvd tem123por ☐ Question4: 543534 in34vifgdunt ut labore ";" ☐ Question1: elifdf343tr, 1sed 23diam nonumyfdg ☐ Question2: 32423df34 ☑ Question3: eirmd45345ocxfdvcxvd tem123por ☑ Question4: 543534 in34vifgdunt ut labore "
Best Regards
guy038
-
-
Hi @guy038,
Your solution made me realize I made an error in specifying the substitution of the ☑s in the second group rather than the first!
However, your solution will fail to replace checked boxes if the text in a question contains a
"
which is fairly foreseeable. (It’s also possible but much less likely that";"
appears in a question.) (The OP’s samples strongly suggest that question text should be considered “anything except newline”.) So it’s safer that the record delimiters include their newlines:☑(?=(?s:(?!"\R).)+";"\R)
-
Hi, @venus642, @neil-schipper and All,
Oh… yes, totally exact and clever iniiative Neil, indeed !
Of course, as the
"
character and the string";"
seemed to be separators, I presumed that they were not used within the content of the questions. But I agree that this new formulation is safer !So, @venus642, the second regex S/R to use is, preferably :
-
SEARCH
☑(?=(?s:(?!"\R).)+";"\R)
-
REPLACE
☐
Cheers,
guy038
-