Community
    • Login

    .ics file selection problem

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    20 Posts 4 Posters 2.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Marcin JewiarzM
      Marcin Jewiarz @PeterJones
      last edited by

      @PeterJones said in .ics file selection problem:

      Have I interpreted correctly: given the data in my t

      Yes that’s what I’m looking for.

      1 Reply Last reply Reply Quote 0
      • Terry RT
        Terry R @Marcin Jewiarz
        last edited by

        @Marcin-Jewiarz said in .ics file selection problem:

        If it would be tuned to find in this block SUMMARY: bla bla bla this would be more than enought

        So my steps to be performed on the data already extracted is:

        1. Convert each “record set” into 1 line
        2. Mark those lines with “bla bla bla” in them
        3. Remove non-marked lines
        4. Convert the single line records back to normal

        1: We will be using the Replace function.
        Find What:(?s)\R(?!BEGIN)
        Replace With:@#@
        Search Mode must be regular expression and have wrap around ticked. Click on the “Replace All” button. All records sets should now be in single lines.

        2: Using the Mark function we have
        Find What:(?i-s)SUMMARY.+?\Qbla bla bla\E
        Have “bookmark lines” ticked. Replace the bla bla bla in the line above with the “literal” text you want to look for. You will see it is encapsulated within the \Q and \E metacharacters. This enables you to safely have any character within this area and not worry that some might have special meaning within the regex environment. Click on the "Mark All’ button. Close window once completed, some lines should be marked.

        3: Under Search, Bookmark, use the “Remove unmarked Lines”. So at this point ONLY those with “bla bla bla” should remain.

        4: return the lines to normal. Use the Replace function
        Find What:@#@
        Replace With:\r\n
        All sections of each record set should be on their own line now.

        I hope this helps.

        Terry

        Marcin JewiarzM 1 Reply Last reply Reply Quote 3
        • Marcin JewiarzM
          Marcin Jewiarz @Terry R
          last edited by

          @Terry-R said in .ics file selection problem:

          (?i-s)SUMMARY.+?\Qbla bla bla\E

          Thank You a lot. This is great, for sure I’ll try to learn more about RegEx, the second time during the week I’ve used it.
          The first was a simple code found in one of the communities to extract important data form service register form laboratory equipment. Now, this. I can make a macro and use it to other files, with modifications to differentr SUMMARY: parameters
          Once again Thank You @Terry-R !

          1 Reply Last reply Reply Quote 1
          • guy038G
            guy038
            last edited by guy038

            Hello, @marcin-jewiarz, @Terry-r, @peterjones and All,

            We may solve the problem in a more simple way, with these two other solutions :

            • First solution :

              • Use the Mark regex (?xs-i) BEGIN:VEVENT ((?!BEGIN:).)*? \Qbla bla bla\E .*? END:VEVENT\R?

              • Then, run the menu option Search > Bookmark > Remove Unmarked Lines

            • Second solution :

              • Use the regex S/R, below, with a negative look-ahead :

                • SEARCH (?xs-i) BEGIN:VEVENT \R ((?!BEGIN:|SUMMARY:\Qbla bla bla\E).)+? END:VEVENT \R?

                • REPLACE Leave EMPTY

            See an updated version of these regexes at the end of this post :

            https://community.notepad-plus-plus.org/post/58092

            For instance, given this text :

            BEGIN:VEVENT
            ...
            SUMMARY:dont include me
            ...
            END:VEVENT
            BEGIN:VEVENT
            ...
            SUMMARY:dont include me
            ...
            END:VEVENT
            BEGIN:VEVENT
            ...
               SUMMARY:bla bla bla
            ...
            END:VEVENT
            BEGIN:VEVENT
            ...
            SUMMARY:bla bla bla
            ...
            END:VEVENT
            BEGIN:VEVENT
            ...
                           SUMMARY:dont include me
            ...
            END:VEVENT
            BEGIN:VEVENT
            ...
            SUMMARY:dont include me
            ...
            END:VEVENT
            BEGIN:VEVENT
            ...
            SUMMARY:bla bla bla
            ...
            END:VEVENT
            BEGIN:VEVENT
            ...
            SUMMARY:dont include me
            ...
            END:VEVENT
            BEGIN:VEVENT
            ...
            SUMMARY:dont include me
            ...
            END:VEVENT
            BEGIN:VEVENT
            ...
            SUMMARY:dont include me
            ...
            END:VEVENT
            BEGIN:VEVENT
            ...
            SUMMARY:bla bla bla
            ...
            END:VEVENT
            BEGIN:VEVENT
            ...
            SUMMARY:bla bla bla
            ...
            END:VEVENT
            BEGIN:VEVENT
            ...
            SUMMARY:dont include me
            ...
            END:VEVENT
            BEGIN:VEVENT
            ...
            SUMMARY:bla bla bla
            ...
            END:VEVENT
            BEGIN:VEVENT
            ...
            SUMMARY:dont include me
            ...
            END:VEVENT
            

            After running this S/R, we get our expected results :

            BEGIN:VEVENT
            ...
               SUMMARY:bla bla bla
            ...
            END:VEVENT
            BEGIN:VEVENT
            ...
            SUMMARY:bla bla bla
            ...
            END:VEVENT
            BEGIN:VEVENT
            ...
            SUMMARY:bla bla bla
            ...
            END:VEVENT
            BEGIN:VEVENT
            ...
            SUMMARY:bla bla bla
            ...
            END:VEVENT
            BEGIN:VEVENT
            ...
            SUMMARY:bla bla bla
            ...
            END:VEVENT
            BEGIN:VEVENT
            ...
            SUMMARY:bla bla bla
            ...
            END:VEVENT
            

            We may use the negative look-ahead feature , of the second regex, to force conditions on several lines, too ! For instance, let’s suppose that each BEGIN:........END: block contains :

            • A line containing Line_<Letter> and that you want to keep the lines Line_A, Line_B and Line_C, only

            • A line containing Expression<Letter> and that you want to keep the lines Expression_X, Expression_Y and Expression_Z, only

            Then, given this sample :

            BEGIN:
            ...
            Line_C
            ...
            test    Expression_X
            ...
            END:
            BEGIN:
            ...
            Expression_PTEST
            ...
                Line_B
            ...
            END:
                 BEGIN:
            ...
            Line_E
            ...
            Expression_X
            ...
            END:
            BEGIN:
            ...
            Expression_M
            ...
              Line_ATEST
            ...
                 END:
            BEGIN:
            ...
            Line_B   Expression_H
            ...
            ...
            END:
                BEGIN:
            ...
            Expression_X
            ...
                Line_K
            ...
                 END:
            BEGIN:
            ...
            Line_C
            ...
            test    Expression_U
            ...
               END:
            BEGIN:
            ...
            Test   Line_E
            ...
            Expression_Q
            ...
            END:
            BEGIN:
            ...
               Expression_X
            ...
            TEST_Line_A
            ...
                END:
                BEGIN:
            ...
            Expression_Y_TEST
            ...
               Line_E
            ...
            END:
               BEGIN:
            ...
            Line_A
            ...
               __Expression_Y__
            ...
                END:
            BEGIN:
            ...
                TESTLine_M_TEST_Expression_ZTest
            ...
            END:
            BEGIN:
            ...
            123456789Expression_Y
            ...
            Line_B_OK
            ...
            END:
            BEGIN:
            ...
            Line_MTEST
            ...
               Expression_J
            ...
            END:
                 BEGIN:
            ...
            Expression_H   Line_L
            ...
            END:
            BEGIN:
            ...
            Expression_Z
            ...
                Line_G
            ...
                  END:
            

            The following regex S/R deletes any block which does not contain the expression Line_A, Line_B or Line_C :

            • SEARCH (?xs-i) ^\h* BEGIN: ((?!BEGIN:|Line_A|Line_B|Line_C).)+? END: .*?$ \R?

            • REPLACE Leave EMPTY

            We get :

            Line_C
            ...
            test    Expression_X
            ...
            END:
            BEGIN:
            ...
            Expression_PTEST
            ...
                Line_B
            ...
            END:
            BEGIN:
            ...
            Expression_M
            ...
              Line_ATEST
            ...
                 END:
            BEGIN:
            ...
            Line_B   Expression_H
            ...
            ...
            END:
            BEGIN:
            ...
            Line_C
            ...
            test    Expression_U
            ...
               END:
            BEGIN:
            ...
               Expression_X
            ...
            TEST_Line_A
            ...
                END:
               BEGIN:
            ...
            Line_A
            ...
               __Expression_Y__
            ...
                END:
            BEGIN:
            ...
            123456789Expression_Y
            ...
            Line_B_OK
            ...
            END:
            

            This last regex S/R deletes any block which does not contain the expression Expression_X, Expression_Y or Expression_Z :

            • SEARCH (?xs-i) ^\h* BEGIN: ((?!BEGIN:|Expression_X|Expression_Y|Expression_Z).)+? END: .*?$ \R?

            • REPLACE Leave EMPTY

            Nice ! Now, each remaining block, below, have, both :

            • A line containing Line_A, Line_B or Line_C

            • A line containing Expression_X, Expression_Y or Expression_Z

            Line_C
            ...
            test    Expression_X
            ...
            END:
            BEGIN:
            ...
               Expression_X
            ...
            TEST_Line_A
            ...
                END:
               BEGIN:
            ...
            Line_A
            ...
               __Expression_Y__
            ...
                END:
            BEGIN:
            ...
            123456789Expression_Y
            ...
            Line_B_OK
            ...
                 END:
            

            Notes :

            • The strings BEGIN: and END: may be preceded by some blank characters

            • You may add characters after the strings BEGIN: and END:

            • The expressions to exclude may occur at any location, within a block

            Best Regards,

            guy038

            Terry RT 2 Replies Last reply Reply Quote 2
            • Terry RT
              Terry R @guy038
              last edited by

              @guy038 said in .ics file selection problem:

              We may solve the problem in a more simple way

              I like it very much. Your were probably seeing the issue I had trying to LOOK for the bla bla bla, rather than your idea is we should look for any that DON’T have the bla bla bla in them, hence the negative lookahead.

              Might I just add 2 sentences for the benefit of @Marcin-Jewiarz, just in case he didn’t notice.

              1. When you say to use the “Mark” regex (First solution) you forgot to mention the requirement to tick the “bookmark lines”. Obviously without it there are no lines bookmarked and the next step will therefore remove ALL lines.
              2. Use of the (?xs-i), the x option denotes the following as being of a “free form nature”. The spaces shown are NOT used, but exist ONLY to make it easier to read. This along with the \Q and \E regex functions aren’t used much, but perhaps should be, especially when OP’s come to us with words like “bla bla bla” and we have to say insert your text in this position, however without knowing what the actual text is, it can sometimes cause issues when one or more is actually a metacharacter.

              Cheers
              Terry

              1 Reply Last reply Reply Quote 3
              • Terry RT
                Terry R @guy038
                last edited by Terry R

                @guy038 said in .ics file selection problem:

                We may solve the problem in a more simple way

                @guy038 as your 2nd regex (which removes the non “bla bla bla” record sets) intrigued me I wondered if a slight alteration might allow the whole process to be carried out with 1 regex. So do a (book)mark with a single regex, then use the “remove unmarked line”.

                I think I may have cracked it. I’m still a bit hesitant to put it forward as a solution as it’s quite complicated and dare I say it, not something I’d expect anybody to readily adapt to any future need. It was really just an exercise to satisfy my curiosity.

                So the regex is:
                (?s-i)BEGIN:VEVENT\R((?=SUMMARY:\Qbla bla bla\E).|(?!SUMMARY|BEGIN:).)+?END:VEVENT\R?
                By bookmarking we will have after running this regex all record sets we want to keep. So we’re back with the positive look-ahead (at least in part) which allows us to remove all the extraneous data not of the BEGIN:VEVENT…END:VEVENT type and the non “bla bla bla” sets in one step.

                I’d value your input on the validity of this. It appears to work on some demo data which includes some without the “bla bla bla” text so from that point of view it is a success.

                Terry

                1 Reply Last reply Reply Quote 1
                • Terry RT
                  Terry R
                  last edited by Terry R

                  To all who are interested in my synopsis:

                  I actually fell onto this quite by chance. I’d edited @guy038 regex to try the positive lookahead again. My regex was picking up all the BEGIN:VEVENT…END:VEVENT sets again. On a whim I added in the ?!SUMMARY in front of the ?!BEGIN as an alternation and suddenly it seemed to work. Several tests later it was still working.

                  I’ve now been pulling my regex apart trying to better understand HOW it works, I suppose not quite believing it. It does seem contrary to both have a positive lookahead and then also a negative using the same characters. So if I understand it correctly:

                  1. We start processing a record set starting with the BEGIN:VEVENT
                  2. Several lines later we approach the SUMMARY line where we want to find the bla bla bla string. This is the lookahead.
                  3. For a record set not containing bla bla bla we fail this positive lookahead (?=SUMMARY:\Qbla bla bla\E).
                  4. As step 3 failed we use the alternation option. At this point it becomes a bit difficult to understand. As alternation works from left to right we first assert we don’t want SUMMARY. As we do currently have this we immediately fail this side of the alternation, so to the right side we assert we don’t want BEGIN:, we don’t and here I would have thought it would continue, but it appears to fail. At least that record set is NOT bookmarked and we start all over again. Actually a glimmer of light. Is it because once we commence moving into the SUMMARY line (so the ?!BEGIN actually was true to start with) the positive lookahead will always fail so we only use the alternation. And in the alternation option ?!SUMMARY also always fails, so we are ONLY using the ?!BEGIN as the method of stopping, and that eventually fails us as well, hence the regex fails. Thus the regex won’t bookmark a non bla bla bla set.

                  Whew, have I actually understood it!

                  Terry

                  1 Reply Last reply Reply Quote 1
                  • Terry RT
                    Terry R
                    last edited by Terry R

                    Further testing has given me another revised regex, shorter than before.

                    I think this one is very easy to understand and could serve as the final solution.

                    (?s-i)BEGIN:VEVENT\R((?=SUMMARY:\Qbla bla bla\E).|(?!SUMMARY:).)+?END:VEVENT\R?

                    1. We want a set that contains the BEGIN and END lines and contains `SUMMARY:bla bla bla’.
                    2. If step 1 fails the alternation says we CANNOT have a line with SUMMARY in it within these boundaries. As that WILL fail (unless no SUMMARY line at all) then the regex fails and thus non bla bla bla record sets are NOT bookmarked.

                    So the proviso is the record set MUST contain valid start and end points, i.e. BEGIN:VEVENT and END:VEVENT (which we have always assumed throughout these posts) and it MUST contain a line starting with SUMMARY:.Depending on what is between the \Q and \E points in the regex determines which record sets are marked and which are NOT.

                    At this point I think I’ve spent enough time on it, my curiosity is now satiated.

                    Terry

                    1 Reply Last reply Reply Quote 1
                    • guy038G
                      guy038
                      last edited by guy038

                      Hi, @Terry-r and All,

                      In this post, you said :

                      I wondered if a slight alteration might allow the whole process to be carried out with 1 regex

                      I’m sorry but the two solutions given, at beginning of my post are totally independent ! So to solve the @marcin-jewiarz problem, you need to run :

                      • The first Mark regex , with the Bookmark line option ticked, then use the Search > Bookmark > Remove Unmarked Lines

                      OR

                      • The second regex S/R ,only

                      So, we do not have to try to mix them up ;-))


                      Then you asked my opinion about your regex :

                      (?s-i)BEGIN:VEVENT\R((?=SUMMARY:\Qbla bla bla\E).|(?!SUMMARY|BEGIN:).)+?END:VEVENT\R?

                      Well, just look at the second alternative (?!SUMMARY|BEGIN:).. This regex means that, between the expression BEGIN:VEVENT\R and END:VEVENT\R?, it should never occur the expression SUMMARY or BEGIN: at any location !

                      So, with this regex, between the expressions BEGIN:VEVENT\R( and END:VEVENT\R?

                      • When the regex engine is at any location, of the block, different from the beginning of a possible line SUMMARY:bla bla bla, this second alternative matches and catches the single character .

                      • When the regex engine is, exactly at the beginning of a line SUMMARY:bla bla bla, the first alternative (?=SUMMARY:\Qbla bla bla\E). does match and catches the single character ., too !

                      So, in short, it matches any char of all blocks containing the expression SUMMARY:bla bla bla

                      Now let’s imagine that you slightly change your regex as below :

                      (?s-i)BEGIN:VEVENT\R((?=SUMMARY:\Qbla bla bla\E).|(?!SUMMARY:\Qbla bla bla\E|BEGIN:).)+?END:VEVENT\R?

                      This time, the two alternatives are totally exclusive, regarding the SUMMARY:bla bla bla string ! So the whole regex just matches any multi-lines block BEGIN:VEVENT.........END:VEVENT !


                      Now, in your last post, you said :

                      Further testing has given me another revised regex, shorter than before

                      As your final regex does not contain the alternative BEGIN:, in the negative look-head ! I support this point ;-)) Indeed, looking back to my second solution, this part is not needed ! I certainly needed this part, at one moment, during my tests, but it seems useless in my final try ;-))

                      So, in summary, the two solutions of my previous post should be updated, without the free-spacing mode, as below :

                      • First solution :

                        • Use the Mark regex (?s-i)BEGIN:VEVENT((?!BEGIN:).)*?\Qbla bla bla\E.*?END:VEVENT\R?    with the Bookmark line ticked

                        • Then, run the menu option Search > Bookmark > Remove Unmarked Lines

                      • Second solution :

                        • Use the regex S/R, below, with a negative look-ahead :

                          • SEARCH (?s-i)BEGIN:VEVENT\R((?!SUMMARY:\Qbla bla bla\E).)+?END:VEVENT\R?

                          • REPLACE Leave EMPTY

                      Remark : In the first solution, we still need to the regex ((?!BEGIN:).)*? instead of the .+? one, to restrict the match to a single block. Indeed, the simple regex .*? can match a line END:VEVENT and the line BEGIN:VEVENT of the next block !

                      Best Regards,

                      guy038

                      P.S. :

                      I’ve verified that my updated second solution does match, as expected, a BEGIN:VEVENT....END:VEVENT block, which does not contain any line SUMMARY:........ like :

                      BEGIN:VEVENT
                      ...
                      ...
                      END:VEVENT
                      
                      1 Reply Last reply Reply Quote 1
                      • Terry RT
                        Terry R
                        last edited by Terry R

                        @guy038 said in .ics file selection problem:

                        at beginning of my post are totally independent !

                        Firstly my apologies. I got fixated on the concept of using a positive lookahead after looking at both of your solutions. For some reason later on a did mix them together and thinking there were 2 steps.

                        Perhaps in my defence I’ve just come to realise my reasoning all the way through was that there would be extraneous lines between the END:VEVENT and BEGIN:VEVENT lines, that is, between the record sets. I’ve just googled a typical ics file and whilst that isn’t true there are additional lines before AND after (header and footer info) the sets we were identifying with the regexes. I’ve got a longish one and reduced the size so you can see what shows in the file.

                        BEGIN:VCALENDAR
                        PRODID:-//Google Inc//Google Calendar 70.9054//EN
                        VERSION:2.0
                        CALSCALE:GREGORIAN
                        METHOD:PUBLISH
                        X-WR-CALNAME:ECML PKDD 2015
                        X-WR-TIMEZONE:Europe/Lisbon
                        X-WR-CALDESC:The European Conference on Machine Learning and Principles and
                          Practice of\nKnowledge Discovery in Databases (ECMLPKDD) will take place i
                         n Porto\,\nPortugal\, from September 7th to 11th\, 2015 (http://www.ecmlpkd
                         d2015.org).\n\nThis event is the leading European scientific event on machi
                         ne learning and\ndata mining and builds upon a very successful series of 25
                          ECML and 18 PKDD\nconferences\, which have been jointly organized for the 
                         past 14 years.
                        BEGIN:VTIMEZONE
                        TZID:Europe/Lisbon
                        X-LIC-LOCATION:Europe/Lisbon
                        BEGIN:STANDARD
                        TZOFFSETFROM:+0100
                        TZOFFSETTO:+0000
                        TZNAME:WET
                        DTSTART:19701025T020000
                        RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
                        END:STANDARD
                        BEGIN:DAYLIGHT
                        TZOFFSETFROM:+0000
                        TZOFFSETTO:+0100
                        TZNAME:WEST
                        DTSTART:19700329T010000
                        RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
                        END:DAYLIGHT
                        END:VTIMEZONE
                        BEGIN:VEVENT
                        DTSTART:20180907T083000Z
                        ...
                        SUMMARY:Ex. Ep. Especial: IP/PROGI 
                        TRANSP:OPAQUE
                        END:VEVENT
                        BEGIN:VEVENT
                        DTSTART;VALUE=DATE:20150803
                        ...
                        SUMMARY:Workshops - Camera Ready
                        TRANSP:TRANSPARENT
                        END:VEVENT
                        BEGIN:VEVENT
                        DTSTART;VALUE=DATE:20150901
                        ...
                        SUMMARY:Tutorials - Tutorials Material
                        TRANSP:TRANSPARENT
                        END:VEVENT
                        END:VCALENDAR
                        

                        So although the OP never showed this I had made the assumption I couldn’t guarantee there weren’t other lines, nor did I think to ask.

                        Thanks for critiquing my regexes. I had made a discovery and couldn’t quite believe I hadn’t considered it before. There have been lots of instances where I wanted to find a data set with a specific string using the lookahead and seeing it would continue through other sets UNTIL it found the correct one. The realisation I had the power to stop it upon a failed string search within the 1 data set was (dare I say it) overwhelming. It was like a light had suddenly switched on, learning a new ability with regexes.

                        Cheers
                        Terry

                        1 Reply Last reply Reply Quote 1
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors