Help with Regex
-
My knowledge with Regex is limited to add a text at the beginning of a line or the end of it.
So I tried with ChatGpt to get a code to do what I want but without any success. I hope you will understand my explanations on what I’m looking to do.
First EOL conversion is on LF.
I have over text with over 30 000 lines.
With two kinds of listing
First :
Movie Title
Director
Actor 1
Actor 2
Actor 3
and so onSecond
Movie Title
Actor 1
Actor 2
Actor 3
and so onWhat I want to do is:
When “Director” is present remove the first line “Movie Title” and all the lines that begins with “actor”.If “Director” is not present. Keep the paragraph.
Thanks for your help
-
@Pangolin
Looks like this should work:
Replace(Movie Title:.+(?:\R|\Z))(Director:.+(?:\R|\Z))?((?:Actor.*?:.+(?:\R|\Z))*)
with?2\2:\1\3
Make sure. matches newline
is turned off.Explanation:
(?:\R|\Z)
matches any newline (including, CR, CRLF, LF) or the end of the file.- There are three capture groups, one for the movie title, one for director, and one for zero or more actors.
- In the replacement, the syntax
?2\2|\1\3
means we should replace with the second capture group (director) only if director was matched. Otherwise, we should replace with the other two capture groups (movie title and any number of actors).
I was able to convert
Movie Title: Baz to the Bone Director: Bazeven Bazlberg Actor 1: baz bazsson Actor 2: foo foosson Actor 3: bar barsson Movie Title: Foo Manchu Actor 1: foo foostein Actor 2: bar barstein Actor 3: baz bazstein Movie Title: Conan the BarBarian Director: Baron Dracula Actor 1: baz bazmeister Actor 2: foo foomeister Actor 3: bar barmeister
to
Director: Bazeven Bazlberg Movie Title: Foo Manchu Actor 1: foo foostein Actor 2: bar barstein Actor 3: baz bazstein Director: Baron Dracula
Note that a
?
after the second capture group is optional for this use case. The find/replace works fine either way, and performance is slightly faster if you don’t include the?
.However, not including the
?
comes at the cost of failing to match anything where the director is not listed, which could be problematic for related use cases. -
Hello, @pangolin, @mark-olson and All,
Here is an other formulation ! Just one constraint : add one / two
line-break(s)
at the very end of your file !SEARCH
(?-si)^Movie Title.*\RDirector.*\R(?:Actor.*\R)*\R+
REPLACE
Leave EMPTY
Notes :
-
The strings Movie Title, Director and Actor must have this exact case
-
The strings Movie Title, Director and Actor can be followed or not with a name
-
The string Actor may not be present in a paragraph block
-
The end of each block to delete may contain several consecutive
line-breaks
Test it, for instance, against this text :
Movie Title Director Actor Actor Actor Movie Title Actor 1 Actor 2 Actor 3 Movie Title Director Movie Title Director Test 1 Test 2 Test 3 Movie Title Director ... Actor ... Actor ... Actor ...
Best regards,
guy038
-
-
Thank you very much Mark and Guy for taking the time to answer my request. It is really appreciated!
Have a nice evening