Help with Regex
Pangolin last edited by Pangolin
My knowledge with Regex is limited to add a text at the beginning of a line or the end of it.
So I tried with ChatGpt to get a code to do what I want but without any success. I hope you will understand my explanations on what I’m looking to do.
First EOL conversion is on LF.
I have over text with over 30 000 lines.
With two kinds of listing
and so on
and so on
What I want to do is:
When “Director” is present remove the first line “Movie Title” and all the lines that begins with “actor”.
If “Director” is not present. Keep the paragraph.
Thanks for your help
Mark Olson last edited by Mark Olson
Looks like this should work:
. matches newlineis turned off.
(?:\R|\Z)matches any newline (including, CR, CRLF, LF) or the end of the file.
- There are three capture groups, one for the movie title, one for director, and one for zero or more actors.
- In the replacement, the syntax
?2\2|\1\3means we should replace with the second capture group (director) only if director was matched. Otherwise, we should replace with the other two capture groups (movie title and any number of actors).
I was able to convert
Movie Title: Baz to the Bone Director: Bazeven Bazlberg Actor 1: baz bazsson Actor 2: foo foosson Actor 3: bar barsson Movie Title: Foo Manchu Actor 1: foo foostein Actor 2: bar barstein Actor 3: baz bazstein Movie Title: Conan the BarBarian Director: Baron Dracula Actor 1: baz bazmeister Actor 2: foo foomeister Actor 3: bar barmeister
Director: Bazeven Bazlberg Movie Title: Foo Manchu Actor 1: foo foostein Actor 2: bar barstein Actor 3: baz bazstein Director: Baron Dracula
Note that a
?after the second capture group is optional for this use case. The find/replace works fine either way, and performance is slightly faster if you don’t include the
However, not including the
?comes at the cost of failing to match anything where the director is not listed, which could be problematic for related use cases.
guy038 last edited by guy038
Hello, @pangolin, @mark-olson and All,
Here is an other formulation ! Just one constraint : add one / two
line-break(s)at the very end of your file !
The strings Movie Title, Director and Actor must have this exact case
The strings Movie Title, Director and Actor can be followed or not with a name
The string Actor may not be present in a paragraph block
The end of each block to delete may contain several consecutive
Test it, for instance, against this text :
Movie Title Director Actor Actor Actor Movie Title Actor 1 Actor 2 Actor 3 Movie Title Director Movie Title Director Test 1 Test 2 Test 3 Movie Title Director ... Actor ... Actor ... Actor ...
Pangolin last edited by
Thank you very much Mark and Guy for taking the time to answer my request. It is really appreciated!
Have a nice evening