Help with regular expression.



  • Hi,
    I need help with regular expression, the situation is :

    need to delelete lines like this:
    @0C 73 20 2C
    @0D 5E 00 00
    @12 E0 B1 4D

    All lines that begin with @ and are with same length (12 chars)

    Thanks in advance.



  • Replace with regular expressions
    this line:@[[:alnum:]]{2} [[:alnum:]]{2} [[:alnum:]]{2} [[:alnum:]]{2}
    with blank



  • It works!!! :)

    Thanks.



  • Hello, @веселин-николов, @lena-marokez, and All,

    As the text to be deleted seems to be a list of four bytes values, in hexadecimal, here is a more strict regex syntax, using the Posix class of characters [[:xdigit:]] :

    SEARCH ^@(?:[[:xdigit:]]{2}\x20?){4}\R

    REPLACE Leave EMPTY

    Notes :

    • The part ^@ searches for a literal @ character, at beginning ( ^ ) of line

    • Then, the part [[:xdigit:]]{2} tries to match two hexadecimal digits, followed the an optional space character ( \x20? )

    • That second part is enclosed in a non-capturing group ( (?:.......) ) and must be present fourth times {4}, ending with its line break ( \R ), which can be, either, \r\n if Windows files, \n if Unix files or \r for Mac files

    • As the replacement field is empty, any entire matched line is, then, deleted

    Best Regards,

    guy038



  • Hi @guy038, I’m wondering if your regex is a bit too lenient.

    The user stated the line starts with the ‘@’ and is 12 characters long. Thus the \h or \x20 is a required parameter. I can see you were trying to lump all 4 instances together so you had to make the space an option to cater for the last instance. Your regex will allow for lines such as @12 3456 7A to also be included.

    Without knowing the types of lines that might exist in the data this could potentially include lines that should not be erased. Whilst my idea of the regex is longer winded it builds on yours and is more strict as well. So mine says lets look for 1 instance of a pair of hexadecimal digits, then 3 lots of a ‘space’ followed by a pair of hexadecimal digits.

    Find what: ^@[[:xdigit:]]{2}(?:\x20[[:xdigit:]]{2}){3}\R
    Replace with: Leave EMPTY

    I welcome your feedback.

    Terry



  • May I ask why not just using ^@.{11}$ ?
    Shouldn’t this fullfil the OP request?

    Cheers
    Claudia



  • Hi @Claudia-Frank , I suppose the real question is, how do we best fulfil the OP request. All the regexs supplied would indeed do as requested. Some are more accurate in their aim than others.

    Without knowing the full extent of the data in the files we can never know for sure that we aren’t causing some ‘harm’ to the data. If we try and keep the regex as tight as we can, and I suppose we also provide some caveat (our assumptions) we can be happy that we’ve done our bit.

    Terry



  • Hi, @terry-r and All,

    You’re perfectly right about it ! I did not notice the other possible matches, due to the optional quantifier ? Thus, when I said “… A more strict regex…”, it is, rather, a “… A less strict regex…” :-((

    Finally, based on the OP needs :

    All lines that begin with @ and are with same length (12 chars)

    The Claudia’s solution seems just correct, doesn’t it ?


    Of course, if no other line, in file contents, begins with the @ character, we could, simply, use the regex S/R :

    SEARCH ^@.+\R

    REPLACE Leave EMPTY

    Cheers,

    guy038



  • That’s the wonderful world of regex - multiple solutions for one problem :-)

    Cheers
    Claudia


Log in to reply