search and replace / regEx
-
Hi community.
I have a corrupted iCalendar file and want to delete with search and replace the waste between two tags:END:VCALENDAR
3F68636A-A88D-4B6D-95C7-DC5B65910335.ics ZÔúƒ4504b552cef6ac7c1141ef12fba9a94a ²VEVENT ZÚ p ZÚˆ3F68636A-A88D-4B6D-95C7-DC5B65910335€ $ (È„ P€ „ Í`õ ¢BEGIN:VCALENDAREverything between END:VCALENDAR and BEGIN:VCALENDAR should be delete.
How do I have to use the search pattern?Thanks, and best regards,
Frank -
using regular expression in find dialog search mode,
find what is(?s)(?<=BEGIN:VCALENDAR).*?(?=\REND:VCALENDAR)
and replace with stays empty, then press Replace All.
Note, you cannot use Replace to jump through (bug) the file
and see what gets replaced.Cheers
Claudia -
Did you get
BEGIN
andEND
mixed up? I found that your regexp didn’t work on the sample data, but this one seems to (the biggest change is to swap BEGIN and END):(?s-i)(?<=END:VCALENDAR).*?(?=BEGIN:VCALENDAR)
Perhaps the OP would like to be able to individually cycle thru the matches. The above solutions don’t allow that but this one does:
Find what zone:
(?s-i)END:VCALENDAR.*?BEGIN:VCALENDAR
Replace with zone:END:VCALENDARBEGIN:VCALENDAR
Search mode: Regular expression -
Hi Scott,
thanks for the head up - yes, I did - I was just thinking BEGIN comes before END but I guess that is where OPs issue comes from.But the regex itself, when switching the end/begin terms should work, shouldn’t?
(?s)(?<=END:VCALENDAR).*?(?=\RBEGIN:VCALENDAR)
does it for me.
Cheers
Claudia -
Starting with your original regexp, I added the
-i
because the OP’s spec was definitely uppercase and I deleted your\R
because there didn’t seem to be a requirement that a line-ending occurred before theBEGIN
, and indeed in the sample data there doesn’t appear to be one? Thus, copying the OP’s sample data to a N++ tab and trying your original regexp on it yielded no matches for me…hmmm… -
You are right, when copying the sample data there seems to be no eol but from what is displayed I assumed there is one
but even without \R it seems to work for meThis cannot be an linux/windows issue, can it be?
Note, I shorten the line to fit into the screen - but it works with the original data as well.
Cheers
Claudia -
Ah, now I see, I guess - you used the original data without the eol but with my regex which included the eol.
Yes, makes sense, does not work.Cheers
Claudia -
Hello, @frank-kirschner, @claudia-frank, @scott-sumner and All,
I thought about a third regex which, in addition, looks if :
-
The
END:VCALENDAR
string is preceded by a line-break -
The
BEGIN:VCALENDAR
string is followed by a line-break
and, in replacement, this regex S/R adds a line-break, if not initially present, in O.P.'s text
So, assuming the four possible cases, below :
blah blah bla bla blaEND:VCALENDAR 3F68636A-A88D-4B6D-95C7-DC5B65910335.ics ZÔúƒ4504b552cef6ac7c1141ef12fba9a94a ²VEVENT ZÚ p ZÚˆ3F68636A-A88D-4B6D-95C7-DC5B65910335€ $ (È„ P€ „ Í`õ ¢BEGIN:VCALENDARblah blah blah bla bla... blah blah bla bla blaEND:VCALENDAR 3F68636A-A88D-4B6D-95C7-DC5B65910335.ics ZÔúƒ4504b552cef6ac7c1141ef12fba9a94a ²VEVENT ZÚ p ZÚˆ3F68636A-A88D-4B6D-95C7-DC5B65910335€ $ (È„ P€ „ Í`õ ¢BEGIN:VCALENDAR blah blah blah... bla bla... blah blah bla bla bla END:VCALENDAR 3F68636A-A88D-4B6D-95C7-DC5B65910335.ics ZÔúƒ4504b552cef6ac7c1141ef12fba9a94a ²VEVENT ZÚ p ZÚˆ3F68636A-A88D-4B6D-95C7-DC5B65910335€ $ (È„ P€ „ Í`õ ¢BEGIN:VCALENDARblah blah blah bla bla... blah blah bla bla bla END:VCALENDAR 3F68636A-A88D-4B6D-95C7-DC5B65910335.ics ZÔúƒ4504b552cef6ac7c1141ef12fba9a94a ²VEVENT ZÚ p ZÚˆ3F68636A-A88D-4B6D-95C7-DC5B65910335€ $ (È„ P€ „ Í`õ ¢BEGIN:VCALENDAR blah blah blah bla bla...
then the regex S/R :
SEARCH
(?s-i)((\R)?END:VCALENDAR).*?(BEGIN:VCALENDAR(\R)?)
REPLACE
(?2:\r\n)\1\r\n\3(?4:\r\n)
would gives the following text ( four identical blocks of text ) :
blah blah bla bla bla END:VCALENDAR BEGIN:VCALENDAR blah blah blah bla bla... blah blah bla bla bla END:VCALENDAR BEGIN:VCALENDAR blah blah blah... bla bla... blah blah bla bla bla END:VCALENDAR BEGIN:VCALENDAR blah blah blah bla bla... blah blah bla bla bla END:VCALENDAR BEGIN:VCALENDAR blah blah blah bla bla...
Et voilà !
Notes :
-
You may, either, click several times on the
Replace
button or once, only, on theReplace All
button -
In search :
-
First the
(?s-i)
modifiers forces :-
The search to be performed in a sensitive way ( NON-insensitive ! )
-
The special dot character
.
to be considered as any single character, even an End of Line one
-
-
Then group
1
contains the string END:VCALENDAR, possibly preceded with a line-break -
The part
(\R)?
( identical to the form(\R){0,1}
) represents an optional line-break ( group2
) -
Now, the
.*?
part ( identical to.{0,}?
) stands for the smallest range of any character, between the two strings END:VCALENDAR and BEGIN:VCALENDAR -
The group
3
contains the string BEGIN:VCALENDAR, possibly followed with a line-break -
Finally, the part
(\R)?
( identical to the form(\R){0,1}
) represents an optional line-break ( group4
)
-
-
In replacement :
-
The conditional replacement feature
(?2:\r\n)
, rewrites a line-break, only if group2
(\R
) does not exist, before the string END:VCALENDAR -
The block
\2\r\n\3
adds the strings END:VCALENDAR and BEGIN:VCALENDAR, separated with a line-break (\r\n
) -
The conditional replacement feature
(?4:\r\n)
, rewrites a line-break, only if group4
(\R
) does not exist, after the string BEGIN:VCALENDAR
-
Cheers,
guy038
-