Community
    • Login

    Can RegEx do the work for me?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    17 Posts 4 Posters 4.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn @Thomas Karlsson
      last edited by

      @Thomas-Karlsson

      Nobody’s making fun.

      And Yup, just use Eko’s regex replacement.

      1 Reply Last reply Reply Quote 3
      • EkopalypseE
        Ekopalypse @Thomas Karlsson
        last edited by

        @Thomas-Karlsson

        did you try my regex? I thought it should do what you want. It is just not 100% clear if we understood your request correctly in first place.

        1 Reply Last reply Reply Quote 1
        • Thomas KarlssonT
          Thomas Karlsson
          last edited by

          Thanks a lot @Ekopalypse & @Alan Kilborn

          I’ll see. Sorry for misinterpreted your answers. My bad!

          Tried your regex Eko. But it seems to just delete the word BEGIN in both Block1 and Block 2.

          EkopalypseE 1 Reply Last reply Reply Quote 0
          • EkopalypseE
            Ekopalypse @Thomas Karlsson
            last edited by Ekopalypse

            @Thomas-Karlsson

            I assume that the block of data looks like this.
            (by the way this is formatted by using ~~~ data ~~~)

            BEGIN|JUNE_CUST_123|1|UPC|SW|PREFIX0000123456|NS
            INLINE| 123456|||||||||1573||||||||||||
            INLINE|2019-06-23|$_HolderAcctID|1994|Client_No 1134||1134
            INLINE|PRE19007801|NSP070|ZZ||||0
            INLINE|NSP123|1234||||
            INLINE|NSP123|MI| 123456|9|1.000||||||||||||||||||||
            INLINE|NSP123| 123456|1677
            INLINE|1234|5678
            END
            BEGIN|JUNE_CUST_123|2|END|SW|PREFIX0000123456|NS
            INLINE|2019-06-23|$_HolderAcctID|1994|Client_No 1134||1134
            INLINE|ZZ|||||
            INLINE|||||123456789| |00002|00002| |||||||
            INLINE|1688|1134
            END
            BEGIN| ...
            

            Is this the case? Or is [Block X] really part of the data?

            1 Reply Last reply Reply Quote 2
            • PeterJonesP
              PeterJones
              last edited by

              @Thomas-Karlsson ,

              Joining the conversation: as the others have hinted at, it helps if:

              1. example data is succinct, without extraneous information, but long enough to show what you want.
              2. you give us distinct blocks of what you want before and after the search/replace
              3. you format the data in a way that it renders properly in the forum. for more on this, see my boilerplate below

              -----
              FYI: here is some helpful information for finding out more about regular expressions, and for formatting posts in this forum (especially quoting data) so that we can fully understand what you’re trying to ask:

              This forum is formatted using Markdown, with a help link buried on the little grey ? in the COMPOSE window/pane when writing your post. For more about how to use Markdown in this forum, please see @Scott-Sumner’s post in the “how to markdown code on this forum” topic, and my updates near the end. It is very important that you use these formatting tips – using single backtick marks around small snippets, and using code-quoting for pasting multiple lines from your example data files – because otherwise, the forum will change normal quotes ("") to curly “smart” quotes (“”), will change hyphens to dashes, will sometimes hide asterisks (or if your text is c:\folder\*.txt, it will show up as c:\folder*.txt, missing the backslash). If you want to clearly communicate your text data to us, you need to properly format it.

              If you have further search-and-replace (“matching”, “marking”, “bookmarking”, regular expression, “regex”) needs, study this FAQ and the documentation it points to. Before asking a new regex question, understand that for future requests, many of us will expect you to show what data you have (exactly), what data you want (exactly), what regex you already tried (to show that you’re showing effort), why you thought that regex would work (to prove it wasn’t just something randomly typed), and what data you’re getting with an explanation of why that result is wrong. When you show that effort, you’ll see us bend over backward to get things working for you. If you need help formatting, see the paragraph above.

              Please note that for all regex and related queries, it is best if you are explicit about what needs to match, and what shouldn’t match, and have multiple examples of both in your example dataset. Often, what shouldn’t match helps define the regular expression as much or more than what should match.

              1 Reply Last reply Reply Quote 2
              • Thomas KarlssonT
                Thomas Karlsson
                last edited by

                Hi guys,

                Thanks for all you help and tips in this matters - appreciated.

                Yes @Ekopalypse, the data blocks looks like this:

                Block one

                BEGIN|JUNE_CUST_123|1|UPC|SW|PREFIX0000123456|NS
                INLINE|   123456|||||||||1573||||||||||||
                INLINE|2019-06-23|$_HolderAcctID|1994|Client_No 1134||1134
                INLINE|PRE19007801|NSP070|ZZ||||0
                INLINE|NSP123|1234||||
                INLINE|NSP123|MI|   123456|9|1.000||||||||||||||||||||
                INLINE|NSP123|   123456|1677
                INLINE|1234|5678
                END
                

                Block two

                BEGIN|JUNE_CUST_123|2|END|SW|PREFIX0000123456|NS
                INLINE|2019-06-23|$_HolderAcctID|1994|Client_No 1134||1134
                INLINE|ZZ|||||
                INLINE|||||123456789| |00002|00002| |||||||
                INLINE|1688|1134
                END
                

                The two Blocks are very similar in it’s structure. But one with more data/rows between the BEGIN and END segments. This Block is the one I call Block One. The other one, what I describe as Block Two has also this BEGIN and END segment but less data/row in between.

                The Blocks, one and two, then repeats it self (filled with different data of course) with up to 10.000 times. The goal here is to create some kind of logic that can mark-up/exclude/delete the data in Block Two.
                Thinking if the first row in each block could be used e.g

                if row begins with:

                BEGIN|JUNE_CUST_123|1|UPC|.... 
                

                read, regardless, until the word END (last word in the block) and keep the block.

                if row begins with:

                BEGIN|JUNE_CUST_123|2|END|SW|.... 
                

                read, regardless, until the word END (last word in the block) and markup/exclude/delete the block.

                Not sure if any of this make any sense.

                Thanks in advance.

                EkopalypseE 1 Reply Last reply Reply Quote 2
                • EkopalypseE
                  Ekopalypse @Thomas Karlsson
                  last edited by

                  @Thomas-Karlsson

                  but this is what my regex does.

                  Sure you used the exact regex I’ve posted?

                  1 Reply Last reply Reply Quote 3
                  • PeterJonesP
                    PeterJones
                    last edited by PeterJones

                    @Thomas-Karlsson,

                    Thanks for adding more detail, and adding the formatting… That really helps your post be more understandable.

                    @Ekopalypse said:

                    but this is what my regex does.

                    Maybe my interpretation is slightly different than yours. You seem to think that @Thomas-Karlsson always expects blocks one and two to be alternating, and that block one will always come immediately before block two. If that’s so, then yours would work.

                    I looked more into the data, and decided that it’s the |1| vs |2| that determines whether it’s block one or block two, so I would search for (?s)BEGIN\|JUNE_CUST_123\|2\|END.*?END$\R* and replace with nothing. This shows what it looks like with “mark” rather than search/replace:

                    (Actually, that image was taken before I added the \R* on the end, because when I switched to Replace mode, it left a blank line, which the \R* at the end gets rid of.)

                    EkopalypseE 1 Reply Last reply Reply Quote 3
                    • EkopalypseE
                      Ekopalypse @PeterJones
                      last edited by

                      @PeterJones

                      You seem to think that @Thomas-Karlsson always expects blocks one and two to be alternating,

                      yes, that’s exactly what I thought is the case :-)

                      1 Reply Last reply Reply Quote 1
                      • PeterJonesP
                        PeterJones
                        last edited by

                        @Ekopalypse said:

                        yes, that’s exactly what I thought is the case :-)

                        Easily understandable; that’s probably the more natural interpretation. On this forum (and similar), I just tend to assume that the OP isn’t clear in requirements, so I often try to look for more subtle clues, like the |1| vs |2|, in case the OP has accidentally implied more than intended. Sometimes, this means that my solution does not work for the OP… but sometimes it means mine does work when other, more literal interpretations, do not.

                        Since the OP implied yours wasn’t working (though hasn’t answered your direct question), maybe my interpretation will be the lucky one this time. :-)

                        If not, @Thomas-Karlsson will have to be more explicit about whether the blocks always alternate, or whether there are ever non-block rows between the two, or other such things.

                        1 Reply Last reply Reply Quote 2
                        • Thomas KarlssonT
                          Thomas Karlsson
                          last edited by

                          @Ekopalypse and @PeterJones - a big BIG thanks for all your help and support.

                          Eko - your regex was marked up each block, both one and tow so it did the job but at the same time not.

                          PeterJones - your regex just nailed it and I’m soooo happy!! So impressed. You have know idea how much time I will save now using your regex. Previously I have search and deleted all of theses 1000 and 1000 rows manually.

                          You have made my day, week, year :) Thanks and thanks in million!

                          Cheers from happy Scandinavia guy :)

                          1 Reply Last reply Reply Quote 5
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors