How to find two or more non-consecutive tabs in a line?



  • Hi, @glossar, @alan-kilborn, @meta-chuh, @peterjones and All,

    Fundamentally, the new Alan’s solution and mine give the same right results, i.e. to match any non-empty line which does not contain a tabulation character !

    By the way, we, both, forget to add the leading in-line-modifier (?-s) to be sure that, even you previously ticked the . matches newline option, the regex engine will suppose that any . char does match a single standard character, only !

    So, our two solutions should be :

    Alan : (?-s)^((?!\t).)+$

    Guy : (?-s)(?!.*\t)^.+


    However, note that the logic, underlying these 2 regular expressions, is a bit different :

    • In the Alan’s regex, from beginning of line ( ^ ), the regex engine matches for one or more standard characters, till the end of line ( $ ), ONLY IF each standard character encountered is not a tabulation character, due to the negative look-ahead (?!\t), located right before the . regex character

    • In the Guy’s regex, the regex engine matches for all the standard characters of a line, ( ^.+ ), ONLY IF ( implicitly at beginning of line ) it cannot find a tabulation character further on, at any position of current line, due to the negative look-ahead (?!.*\t)

    I did a test with a file of 2,500,000 lines, half of which contained 1 tabulation character and, clearly, the Alan’s version is faster ! ( 2 mn 15 s for Alan instead of 5mn for my version )

    BR

    guy038


Log in to reply