Anyone can help with this regex?



  • So I have a data like follows:

    1.gooddata
    2.gooddata
    3.gooddata
    FF

    random
    notrelevant


    header


    4.gooddata
    5.gooddata
    6.gooddata
    FF

    and it goes over and over again. My question is, how do I use regex to find “FF” as a start point and delete everything in between the “FF” and “- - - - - -” so the final output would be like this:

    1.gooddata
    2.gooddata
    3.gooddata
    4.gooddata
    5.gooddata
    6.gooddata

    Many thanks for reading my post.



  • Search for FF.*?- - - - - - and make sure to check the box that says . matches newlines

    In general if you have any starting string S and ending string E you can just put .*? in between them like S.*?E

    Edit: Well this would get you part of the way I think…



  • This should do it, best I can tell from your description of the data (i.e., without getting to crazy about trying to catch possible situations you didn’t describe, for example, are there space characters after your FF data on the lines…):

    Find what box:

    (?s)FF\R.*?FF\R
    

    Replace with box: make sure it is empty!

    Search Mode: Regular expression



  • Hi,

    I’m not sure if your sample is complete. Also I can see there header section, that you don’t mention when you talked about just FF and - - - - - -. Therefore I’m not sure if it’s all part of text?

    But try this:

    1. Backup your file !!!
    2. CTRL + H (Replace)
    3. Find what: ^((FF|header)[\s\S]*?- - - - - -|\s*)$[\r\n]+
      Replace with: (empty => delete)
      Search Mode: Regular expression
    4. Replace All

    My short explanation of: ^((FF|header)[\s\S]*?- - - - - -|\s*)$[\r\n]+

    • Look for line starting with FF OR header. If found, select all following text, until you reach - - - - - -.
    • In addition (OR) select blank lines.

    That’s as much as I can get from your text. But if there are som spaces or something different, just update data, so we can update pattern to match it.

    For complete technical explanation or pattern insert expression on this page Regex101.



  • Hello Shayne Z. and All,

    I think I’ve got a general regex which allows to search and delete the smaller range between two strings, let’s say, ABC and XYZ, INCLUDED the two lines containing these strings ABC and XYZ. So :

    • The first line deleted will be the line containing the string ABC. This line may be any of these four forms : ABC or ABC789 or 123ABC or 123ABC789.

    • The nearest line, containing the string XYZ, will be the last line deleted. This line, as well, may be any of the four forms : XYZ or XYZ789 or 123XYZ or 123XYZ789

    • Every line, even blank or empty ones, between these the two lines above, will be deleted


    This regex does work for particular cases such as :

    • A single line, containing the two strings ABC and XYZ

    • Two consecutive lines, containing ABC, then XYZ

    • Lines containing several start delimiter ABC and/or end delimiter XYZ

    • Lines with a mixed form of these two delimiters, as, for instance, the line 123ABC456XYZ789XYZ012ABC345ABCXYZ6789

    Of course, you must replace the example delimiters ABC and XYZ, by your own strings, used as delimiters !


    So, just follow the few steps, below :

    • Select a range of text, ONLY IF your want to restrict the future suppression to a part of your file

    • Open the Replace dialog ( CTRL + H )

    • Choose the Regular expression search mode

    • Check, preferably, the Match case option

    • Check the In selection option, if you previously selected some amount of text

    • In the Find what zone, type in (?-s)^.*ABC(?s).*?(?-s)XYZ.*(\R|\z)

    • Leave the Replace With zone EMPTY

    • Finally, click on the Replace All button

    Et voilà !


    Some explanations :

    • The (?-s) syntax is a modifier that means that the DOT character DO NOT match the END of LINE characters ( \r, \n or \r\n ). Note that, the opposite form, (?s) means that, from now on, the DOT matches, absolutely, ANY character !

    • The regex ^.*ABC matches from a beginning of line to the last string ABC found, further, in the SAME line

    • The regex (?s).*? matches any character, EVEN the END of LINE character(s), till the nearest string XYZ, found, further, even some lines after !

    • The regex (?-s)XYZ.* matches the string XYZ, then any standard character, on the SAME line, till its END of LINE character(s)

    • Finally, the regex (\R|\z) matches any EOL character(s) ( \r\n in a Windows file, \n in an UNIX file or \r in an old MAC file ) OR the VERY end of the file


    IMPORTANT :

    The way I put the different option modifiers, in the regex above, allows you to use regexes, instead of fixed strings, as delimiters :-) For instance, let’s suppose that :

    • The first line to delete would be a line containing the string ABC and, further, on the same line, the string DEF,

    • The last line to delete would be a line containing the string UVW and, further, on the same line, the string XYZ

    In that case, the search regex, above, would become :

    (?-s)^.*ABC.*DEF(?s).*?(?-s)UVW.*XYZ.*(\R|\z)

    Best regards,

    guy038



  • Hi All,

    I just forgot to give an example of the general S/R, detailed, in my previous post !

    Then, giving the upper-case string ABC, as a start delimiter and the upper-case string XYZ as en end delimiter, which leads to the regex :

    • SEARCH = (?-s)^.*ABC(?s).*?(?-s)XYZ.*(\R|\z)

    • REPLACE = NOTHING

    The text, below :

    This line, containing ABC, will be deleted
    This is a BLOCK
    
    of text which will			 
    be DELETED
    
    as well as this line XYZ
    This piece of text
    
    will NOT be DELETED
    
    but the BLOCK of the TWO NEXT ONES will
    ABC
    XYZ
    This text, with some blank lines,
    
    
    won't be modified, but the NEXT line will !
    ABCXYZ
    
    The BLOCK of the TWO NEXT lines, below, will be DELETED
    12345ABC 67890 ABC
    --- XYZ XYZ ---
    
    as well as this LAST block, below
    --- ABC --- XYZ --- ABC  
    
    --- ABC --- XYZ --- XYZ --- ABC --- ABCXYZ ---
    

    will be CHANGED into :

    This piece of text
    
    will NOT be DELETED
    
    but the BLOCK of the TWO NEXT ONES will
    This text, with some blank lines,
    
    
    won't be modified, but the NEXT line will !
    
    The BLOCK of the TWO NEXT lines, below, will be DELETED
    
    as well as this LAST block, below
    

    Cheers,

    guy038


Log in to reply