Community
    • Login

    Can RegEx do the work for me?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    17 Posts 4 Posters 4.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Thomas KarlssonT
      Thomas Karlsson
      last edited by

      Hi all,

      I have this big transaction file. The structure in the file could be describe as two blocks. Both blocks has x numbers of “in-lines”. Each block starts with a BEGIN segment and ends with END segment.

      The thing is that I would like to create some functionality to constantly exclude/delete block 2, regardless what information it contain. Tried to play around a bit with RegEx in Notpad++ (v6.5) but no success so far.

      So thought I drop the question here among all you talent people to help me further.

      [BLOCK 1]
      BEGIN|JUNE_CUST_123|1|UPC|SW|PREFIX0000123456|NS
      INLINE| 123456|||||||||1573||||||||||||
      INLINE|2019-06-23|$_HolderAcctID|1994|Client_No 1134||1134
      INLINE|PRE19007801|NSP070|ZZ||||0
      INLINE|NSP123|1234||||
      INLINE|NSP123|MI| 123456|9|1.000||||||||||||||||||||
      INLINE|NSP123| 123456|1677
      INLINE|1234|5678
      END
      [BLOCK 2]
      BEGIN|JUNE_CUST_123|2|END|SW|PREFIX0000123456|NS
      INLINE|2019-06-23|$_HolderAcctID|1994|Client_No 1134||1134
      INLINE|ZZ|||||
      INLINE|||||123456789| |00002|00002| |||||||
      INLINE|1688|1134
      END
      [BLOCK 1]
      BEGIN|.…
      ….
      ….
      END
      [BLOCK 2]
      BEGIN|.…
      ….
      ….
      END
      [BLOCK 1]
      BEGIN|.…
      ….
      ….
      END
      [BLOCK 2]
      BEGIN|.…
      ….
      ….
      END
      and so on….

      As I said - tried to used RegEx but get stuck as the very same row is included in both block one as well as block two. Numer INLINE nr 2 in block one and INLINE nr 1 in block 2.

      Thanks in advance
      Thomas

      EkopalypseE 1 Reply Last reply Reply Quote 1
      • EkopalypseE
        Ekopalypse @Thomas Karlsson
        last edited by

        @Thomas-Karlsson

        if I understood correctly, what about
        find what:(?s)(BEGIN.*?\REND\R)(BEGIN.*?\REND\R)
        replace with: $1

        Alan KilbornA 1 Reply Last reply Reply Quote 4
        • Alan KilbornA
          Alan Kilborn @Ekopalypse
          last edited by

          This is one of those cases where I had to see the solution to understand what the problem really was! :)

          EkopalypseE 1 Reply Last reply Reply Quote 3
          • EkopalypseE
            Ekopalypse @Alan Kilborn
            last edited by

            @Alan-Kilborn

            :-D well, then please explain to me because I’m not sure if this is really correct what I’ve posted?

            Alan KilbornA 1 Reply Last reply Reply Quote 0
            • Alan KilbornA
              Alan Kilborn @Ekopalypse
              last edited by Alan Kilborn

              @Ekopalypse

              I think you are correct, at least, but it sure would be better for posters to say:

              This is what I have:

              foo
              foo
              foo
              

              And this is what I need when I’m done:

              bar
              bar
              bar
              

              And BTW you don’t have to come up with a regex for that.

              1 Reply Last reply Reply Quote 2
              • Thomas KarlssonT
                Thomas Karlsson
                last edited by

                Thanks for your answers, appreciated.

                Not sure if you are trying to help or if you just making fun of my question?

                So, regex or not. Can I somehow read through 1000 and 1000 rows of these 2 blocks and bookmark/exclude/delete Block 2 using Notepad++

                Thanks

                Alan KilbornA EkopalypseE 2 Replies Last reply Reply Quote 1
                • Alan KilbornA
                  Alan Kilborn @Thomas Karlsson
                  last edited by

                  @Thomas-Karlsson

                  Nobody’s making fun.

                  And Yup, just use Eko’s regex replacement.

                  1 Reply Last reply Reply Quote 3
                  • EkopalypseE
                    Ekopalypse @Thomas Karlsson
                    last edited by

                    @Thomas-Karlsson

                    did you try my regex? I thought it should do what you want. It is just not 100% clear if we understood your request correctly in first place.

                    1 Reply Last reply Reply Quote 1
                    • Thomas KarlssonT
                      Thomas Karlsson
                      last edited by

                      Thanks a lot @Ekopalypse & @Alan Kilborn

                      I’ll see. Sorry for misinterpreted your answers. My bad!

                      Tried your regex Eko. But it seems to just delete the word BEGIN in both Block1 and Block 2.

                      EkopalypseE 1 Reply Last reply Reply Quote 0
                      • EkopalypseE
                        Ekopalypse @Thomas Karlsson
                        last edited by Ekopalypse

                        @Thomas-Karlsson

                        I assume that the block of data looks like this.
                        (by the way this is formatted by using ~~~ data ~~~)

                        BEGIN|JUNE_CUST_123|1|UPC|SW|PREFIX0000123456|NS
                        INLINE| 123456|||||||||1573||||||||||||
                        INLINE|2019-06-23|$_HolderAcctID|1994|Client_No 1134||1134
                        INLINE|PRE19007801|NSP070|ZZ||||0
                        INLINE|NSP123|1234||||
                        INLINE|NSP123|MI| 123456|9|1.000||||||||||||||||||||
                        INLINE|NSP123| 123456|1677
                        INLINE|1234|5678
                        END
                        BEGIN|JUNE_CUST_123|2|END|SW|PREFIX0000123456|NS
                        INLINE|2019-06-23|$_HolderAcctID|1994|Client_No 1134||1134
                        INLINE|ZZ|||||
                        INLINE|||||123456789| |00002|00002| |||||||
                        INLINE|1688|1134
                        END
                        BEGIN| ...
                        

                        Is this the case? Or is [Block X] really part of the data?

                        1 Reply Last reply Reply Quote 2
                        • PeterJonesP
                          PeterJones
                          last edited by

                          @Thomas-Karlsson ,

                          Joining the conversation: as the others have hinted at, it helps if:

                          1. example data is succinct, without extraneous information, but long enough to show what you want.
                          2. you give us distinct blocks of what you want before and after the search/replace
                          3. you format the data in a way that it renders properly in the forum. for more on this, see my boilerplate below

                          -----
                          FYI: here is some helpful information for finding out more about regular expressions, and for formatting posts in this forum (especially quoting data) so that we can fully understand what you’re trying to ask:

                          This forum is formatted using Markdown, with a help link buried on the little grey ? in the COMPOSE window/pane when writing your post. For more about how to use Markdown in this forum, please see @Scott-Sumner’s post in the “how to markdown code on this forum” topic, and my updates near the end. It is very important that you use these formatting tips – using single backtick marks around small snippets, and using code-quoting for pasting multiple lines from your example data files – because otherwise, the forum will change normal quotes ("") to curly “smart” quotes (“”), will change hyphens to dashes, will sometimes hide asterisks (or if your text is c:\folder\*.txt, it will show up as c:\folder*.txt, missing the backslash). If you want to clearly communicate your text data to us, you need to properly format it.

                          If you have further search-and-replace (“matching”, “marking”, “bookmarking”, regular expression, “regex”) needs, study this FAQ and the documentation it points to. Before asking a new regex question, understand that for future requests, many of us will expect you to show what data you have (exactly), what data you want (exactly), what regex you already tried (to show that you’re showing effort), why you thought that regex would work (to prove it wasn’t just something randomly typed), and what data you’re getting with an explanation of why that result is wrong. When you show that effort, you’ll see us bend over backward to get things working for you. If you need help formatting, see the paragraph above.

                          Please note that for all regex and related queries, it is best if you are explicit about what needs to match, and what shouldn’t match, and have multiple examples of both in your example dataset. Often, what shouldn’t match helps define the regular expression as much or more than what should match.

                          1 Reply Last reply Reply Quote 2
                          • Thomas KarlssonT
                            Thomas Karlsson
                            last edited by

                            Hi guys,

                            Thanks for all you help and tips in this matters - appreciated.

                            Yes @Ekopalypse, the data blocks looks like this:

                            Block one

                            BEGIN|JUNE_CUST_123|1|UPC|SW|PREFIX0000123456|NS
                            INLINE|   123456|||||||||1573||||||||||||
                            INLINE|2019-06-23|$_HolderAcctID|1994|Client_No 1134||1134
                            INLINE|PRE19007801|NSP070|ZZ||||0
                            INLINE|NSP123|1234||||
                            INLINE|NSP123|MI|   123456|9|1.000||||||||||||||||||||
                            INLINE|NSP123|   123456|1677
                            INLINE|1234|5678
                            END
                            

                            Block two

                            BEGIN|JUNE_CUST_123|2|END|SW|PREFIX0000123456|NS
                            INLINE|2019-06-23|$_HolderAcctID|1994|Client_No 1134||1134
                            INLINE|ZZ|||||
                            INLINE|||||123456789| |00002|00002| |||||||
                            INLINE|1688|1134
                            END
                            

                            The two Blocks are very similar in it’s structure. But one with more data/rows between the BEGIN and END segments. This Block is the one I call Block One. The other one, what I describe as Block Two has also this BEGIN and END segment but less data/row in between.

                            The Blocks, one and two, then repeats it self (filled with different data of course) with up to 10.000 times. The goal here is to create some kind of logic that can mark-up/exclude/delete the data in Block Two.
                            Thinking if the first row in each block could be used e.g

                            if row begins with:

                            BEGIN|JUNE_CUST_123|1|UPC|.... 
                            

                            read, regardless, until the word END (last word in the block) and keep the block.

                            if row begins with:

                            BEGIN|JUNE_CUST_123|2|END|SW|.... 
                            

                            read, regardless, until the word END (last word in the block) and markup/exclude/delete the block.

                            Not sure if any of this make any sense.

                            Thanks in advance.

                            EkopalypseE 1 Reply Last reply Reply Quote 2
                            • EkopalypseE
                              Ekopalypse @Thomas Karlsson
                              last edited by

                              @Thomas-Karlsson

                              but this is what my regex does.

                              Sure you used the exact regex I’ve posted?

                              1 Reply Last reply Reply Quote 3
                              • PeterJonesP
                                PeterJones
                                last edited by PeterJones

                                @Thomas-Karlsson,

                                Thanks for adding more detail, and adding the formatting… That really helps your post be more understandable.

                                @Ekopalypse said:

                                but this is what my regex does.

                                Maybe my interpretation is slightly different than yours. You seem to think that @Thomas-Karlsson always expects blocks one and two to be alternating, and that block one will always come immediately before block two. If that’s so, then yours would work.

                                I looked more into the data, and decided that it’s the |1| vs |2| that determines whether it’s block one or block two, so I would search for (?s)BEGIN\|JUNE_CUST_123\|2\|END.*?END$\R* and replace with nothing. This shows what it looks like with “mark” rather than search/replace:

                                (Actually, that image was taken before I added the \R* on the end, because when I switched to Replace mode, it left a blank line, which the \R* at the end gets rid of.)

                                EkopalypseE 1 Reply Last reply Reply Quote 3
                                • EkopalypseE
                                  Ekopalypse @PeterJones
                                  last edited by

                                  @PeterJones

                                  You seem to think that @Thomas-Karlsson always expects blocks one and two to be alternating,

                                  yes, that’s exactly what I thought is the case :-)

                                  1 Reply Last reply Reply Quote 1
                                  • PeterJonesP
                                    PeterJones
                                    last edited by

                                    @Ekopalypse said:

                                    yes, that’s exactly what I thought is the case :-)

                                    Easily understandable; that’s probably the more natural interpretation. On this forum (and similar), I just tend to assume that the OP isn’t clear in requirements, so I often try to look for more subtle clues, like the |1| vs |2|, in case the OP has accidentally implied more than intended. Sometimes, this means that my solution does not work for the OP… but sometimes it means mine does work when other, more literal interpretations, do not.

                                    Since the OP implied yours wasn’t working (though hasn’t answered your direct question), maybe my interpretation will be the lucky one this time. :-)

                                    If not, @Thomas-Karlsson will have to be more explicit about whether the blocks always alternate, or whether there are ever non-block rows between the two, or other such things.

                                    1 Reply Last reply Reply Quote 2
                                    • Thomas KarlssonT
                                      Thomas Karlsson
                                      last edited by

                                      @Ekopalypse and @PeterJones - a big BIG thanks for all your help and support.

                                      Eko - your regex was marked up each block, both one and tow so it did the job but at the same time not.

                                      PeterJones - your regex just nailed it and I’m soooo happy!! So impressed. You have know idea how much time I will save now using your regex. Previously I have search and deleted all of theses 1000 and 1000 rows manually.

                                      You have made my day, week, year :) Thanks and thanks in million!

                                      Cheers from happy Scandinavia guy :)

                                      1 Reply Last reply Reply Quote 5
                                      • First post
                                        Last post
                                      The Community of users of the Notepad++ text editor.
                                      Powered by NodeBB | Contributors