Can RegEx do the work for me?



  • @Thomas-Karlsson

    did you try my regex? I thought it should do what you want. It is just not 100% clear if we understood your request correctly in first place.



  • Thanks a lot @Ekopalypse & @Alan Kilborn

    I’ll see. Sorry for misinterpreted your answers. My bad!

    Tried your regex Eko. But it seems to just delete the word BEGIN in both Block1 and Block 2.



  • @Thomas-Karlsson

    I assume that the block of data looks like this.
    (by the way this is formatted by using ~~~ data ~~~)

    BEGIN|JUNE_CUST_123|1|UPC|SW|PREFIX0000123456|NS
    INLINE| 123456|||||||||1573||||||||||||
    INLINE|2019-06-23|$_HolderAcctID|1994|Client_No 1134||1134
    INLINE|PRE19007801|NSP070|ZZ||||0
    INLINE|NSP123|1234||||
    INLINE|NSP123|MI| 123456|9|1.000||||||||||||||||||||
    INLINE|NSP123| 123456|1677
    INLINE|1234|5678
    END
    BEGIN|JUNE_CUST_123|2|END|SW|PREFIX0000123456|NS
    INLINE|2019-06-23|$_HolderAcctID|1994|Client_No 1134||1134
    INLINE|ZZ|||||
    INLINE|||||123456789| |00002|00002| |||||||
    INLINE|1688|1134
    END
    BEGIN| ...
    

    Is this the case? Or is [Block X] really part of the data?



  • @Thomas-Karlsson ,

    Joining the conversation: as the others have hinted at, it helps if:

    1. example data is succinct, without extraneous information, but long enough to show what you want.
    2. you give us distinct blocks of what you want before and after the search/replace
    3. you format the data in a way that it renders properly in the forum. for more on this, see my boilerplate below

    -----
    FYI: here is some helpful information for finding out more about regular expressions, and for formatting posts in this forum (especially quoting data) so that we can fully understand what you’re trying to ask:

    This forum is formatted using Markdown, with a help link buried on the little grey ? in the COMPOSE window/pane when writing your post. For more about how to use Markdown in this forum, please see @Scott-Sumner’s post in the “how to markdown code on this forum” topic, and my updates near the end. It is very important that you use these formatting tips – using single backtick marks around small snippets, and using code-quoting for pasting multiple lines from your example data files – because otherwise, the forum will change normal quotes ("") to curly “smart” quotes (“”), will change hyphens to dashes, will sometimes hide asterisks (or if your text is c:\folder\*.txt, it will show up as c:\folder*.txt, missing the backslash). If you want to clearly communicate your text data to us, you need to properly format it.

    If you have further search-and-replace (“matching”, “marking”, “bookmarking”, regular expression, “regex”) needs, study this FAQ and the documentation it points to. Before asking a new regex question, understand that for future requests, many of us will expect you to show what data you have (exactly), what data you want (exactly), what regex you already tried (to show that you’re showing effort), why you thought that regex would work (to prove it wasn’t just something randomly typed), and what data you’re getting with an explanation of why that result is wrong. When you show that effort, you’ll see us bend over backward to get things working for you. If you need help formatting, see the paragraph above.

    Please note that for all regex and related queries, it is best if you are explicit about what needs to match, and what shouldn’t match, and have multiple examples of both in your example dataset. Often, what shouldn’t match helps define the regular expression as much or more than what should match.



  • Hi guys,

    Thanks for all you help and tips in this matters - appreciated.

    Yes @Ekopalypse, the data blocks looks like this:

    Block one

    BEGIN|JUNE_CUST_123|1|UPC|SW|PREFIX0000123456|NS
    INLINE|   123456|||||||||1573||||||||||||
    INLINE|2019-06-23|$_HolderAcctID|1994|Client_No 1134||1134
    INLINE|PRE19007801|NSP070|ZZ||||0
    INLINE|NSP123|1234||||
    INLINE|NSP123|MI|   123456|9|1.000||||||||||||||||||||
    INLINE|NSP123|   123456|1677
    INLINE|1234|5678
    END
    

    Block two

    BEGIN|JUNE_CUST_123|2|END|SW|PREFIX0000123456|NS
    INLINE|2019-06-23|$_HolderAcctID|1994|Client_No 1134||1134
    INLINE|ZZ|||||
    INLINE|||||123456789| |00002|00002| |||||||
    INLINE|1688|1134
    END
    

    The two Blocks are very similar in it’s structure. But one with more data/rows between the BEGIN and END segments. This Block is the one I call Block One. The other one, what I describe as Block Two has also this BEGIN and END segment but less data/row in between.

    The Blocks, one and two, then repeats it self (filled with different data of course) with up to 10.000 times. The goal here is to create some kind of logic that can mark-up/exclude/delete the data in Block Two.
    Thinking if the first row in each block could be used e.g

    if row begins with:

    BEGIN|JUNE_CUST_123|1|UPC|.... 
    

    read, regardless, until the word END (last word in the block) and keep the block.

    if row begins with:

    BEGIN|JUNE_CUST_123|2|END|SW|.... 
    

    read, regardless, until the word END (last word in the block) and markup/exclude/delete the block.

    Not sure if any of this make any sense.

    Thanks in advance.



  • @Thomas-Karlsson

    but this is what my regex does.

    Sure you used the exact regex I’ve posted?



  • @Thomas-Karlsson,

    Thanks for adding more detail, and adding the formatting… That really helps your post be more understandable.

    @Ekopalypse said:

    but this is what my regex does.

    Maybe my interpretation is slightly different than yours. You seem to think that @Thomas-Karlsson always expects blocks one and two to be alternating, and that block one will always come immediately before block two. If that’s so, then yours would work.

    I looked more into the data, and decided that it’s the |1| vs |2| that determines whether it’s block one or block two, so I would search for (?s)BEGIN\|JUNE_CUST_123\|2\|END.*?END$\R* and replace with nothing. This shows what it looks like with “mark” rather than search/replace:

    (Actually, that image was taken before I added the \R* on the end, because when I switched to Replace mode, it left a blank line, which the \R* at the end gets rid of.)



  • @PeterJones

    You seem to think that @Thomas-Karlsson always expects blocks one and two to be alternating,

    yes, that’s exactly what I thought is the case :-)



  • @Ekopalypse said:

    yes, that’s exactly what I thought is the case :-)

    Easily understandable; that’s probably the more natural interpretation. On this forum (and similar), I just tend to assume that the OP isn’t clear in requirements, so I often try to look for more subtle clues, like the |1| vs |2|, in case the OP has accidentally implied more than intended. Sometimes, this means that my solution does not work for the OP… but sometimes it means mine does work when other, more literal interpretations, do not.

    Since the OP implied yours wasn’t working (though hasn’t answered your direct question), maybe my interpretation will be the lucky one this time. :-)

    If not, @Thomas-Karlsson will have to be more explicit about whether the blocks always alternate, or whether there are ever non-block rows between the two, or other such things.



  • @Ekopalypse and @PeterJones - a big BIG thanks for all your help and support.

    Eko - your regex was marked up each block, both one and tow so it did the job but at the same time not.

    PeterJones - your regex just nailed it and I’m soooo happy!! So impressed. You have know idea how much time I will save now using your regex. Previously I have search and deleted all of theses 1000 and 1000 rows manually.

    You have made my day, week, year :) Thanks and thanks in million!

    Cheers from happy Scandinavia guy :)


Log in to reply