Help. Need regex

Евгений Фокин

Hello everyone.
I need some help.
I have a text:

And now, I need delete all <text to </text> with russian symbols in replacement.

2kk lines…so hard trying without regex.

Terry R

I think the following regex will do as you want. I’ve made provision for removing a CR/LF after each instance so that you don’t finish up with blank lines.

Find what: (?s)<text.+?</text>\R?
Replace with: empty line here

The ? after the \R is provision for the last line needing removal. This won’t have a CR/LF so this allows that line still to be captured if it fits the criteria.

Terry

Terry R

First off, apologies @Евгений-Фокин , I misread your question. I’ve come back to it now as I didn’t see you reply to my solution. @Claudia-Frank had upvoted my answer but also misread the question.

I can see now that in fact you were only wanting to remove the:
text autoid…</text> IF the line starting with <replacement> in that group of lines had Russian characters in it.

I have a revised regex which I think fits what you intended. I would suggest testing it by using in ‘Find’ mode first. You could have it bookmark all the instances that it thinks fit the criteria, then look at those before deciding to remove.
Find What: (?s)<text.+?<replacement><\!\[CDATA\[[\x{080}-\x{4af}].+?text>\R?
Replace with: empty field here

The part that checks for a Russian character is the \x{080}-\x{4af}, I hope I’ve got that sequence correct. It only checks the first character, hopefully no non-Cyrillic character at the start with Cyrillic afterwards.

I would like to hear back from you regardless of whether it helps or not. if it does NOT help, we on the forum are only too willing to offer up other possible solutions.

Terry