How to find two or more non-consecutive tabs in a line?
-
I can confirm that it finds a line that contains two tabs but if a line doesn’t meet the criteria, it looks further (greedy, you say? :) )and hence finds the following line together, which in the end looks like “every other line”. But I’m pretty sure it skips the \r\n.of a line if this line contains only one tab. Can you limit the regex, so it should look for and within only one line (by line, I mean anything between ^ and \r\n).
-
-
maybe a screenshot helps:
-
I can’t see the screenshots above - neither on this page nor when clicking on it. All I see is a broken-image-file-icon and “Imgur” next to it.
-
Okay, one more try. It could be as simple(!) as changing it to this:
(?-s)^.*?\t(?!\t).+?\t.*?$
:)
-
Thanks, that now works like a charm! :)
While we are at it, how about building another regex that locates a line that contains no tab? :)
-
@glossar said:
regex that locates a line that contains no tab?
There might be better ones, but this one seems to work:
^((?!\t).)*$
-
Hi, @glossar, @alan-kilborn, and All,
A second solution could be :
SEARCH
(?-s)(?=.*\t.*\t).+
A third solution could be, using the Mark dialog, w/o checking the
Bookmark line
option :MARK
(?-s)\t.*\t
Note, @alan-kilborn, that your regex should be changed into :
SEARCH
(?-s)^.*?\t[^\t\r\n]+\t.*?$
To avoid wrong multi-lines match. However, this solution still misses some possibilities !
You may test these
3
regexes, above, against the sample test, below :---------------------------- 1 TEXT block without TAB -----> KO <----- ( because NO tabulation ) abcd ---------------------------- 1 TAB without TEXT ----------> KO <----- ( because ONE tabulation ONLY ) ---------------------------- 2 TABs without TEXT ----------- OK ------ ---------------------------- 3 TABs without TEXT ----------- OK ------ ---------------------------- 1 TAB + 1 TEXT block --------> KO <----- ( because ONE tabulation ONLY ) abcd abcd ---------------------------- 1 TAB + 2 TEXT blocks -------> KO <----- ( because ONE tabulation ONLY ) abcd efgh ---------------------------- 2 TABs + 1 TEXT block --------- OK ------ efgh efgh efgh ---------------------------- 2 TABs + 2 TEXT blocks -------- OK ------ abcd efgh abcd ijkm efgh ijkl ---------------------------- 2 TABs + 3 TEXT blocks -------- OK ------ abcd efgh ijkl ---------------------------- 3 TABs + 1 Text block --------- OK ------ abcd efgh ijkl mnop ---------------------------- 3 TABs + 2 Text blocks -------- OK ------ abcd efgh abcd ijkl abcd monp efgh ijkl efgh monp ijkl monp ---------------------------- 3 TABs + 3 Text blocks -------- OK ------ abcd efgh ijkm efgh ijkl mnop ---------------------------- 3 TABs + 4 Text blocks -------- OK ------ abcd efgh ijkl mnop
Best Regards,
guy038
-
@glossar , @Alan-Kilborn , @Meta-Chuh , et alia,
Unfortunately, the
(?-s)
only changes the behavior of.
with respect to newlines; it doesn’t change character classes, so[^\t]+
means “one or more characters that don’t match a TAB, even if those characters are newlines”. By changing the full regex to(?-s)^.*?\t[^\t\r\n]+\t.*?$
, I was able to get it to skip lines like @Meta-Chuh 's example ofx
instead of the TAB. The class[^\t\r\n]
means “match one or more characters that isn’t any ofTAB
,CR
(carriage return), orLF
(line-feed)”I am not as regex expert as @guy038, so I may be misinterpreting; however, the boost docs say (emphasis mine)
Escaped Characters
All the escape sequences that match a single character, or a single character class are permitted within a character class definition. For example [[]] would match either of [ or ] while [\W\d] would match any character that is either a “digit”, or is not a “word” character.Since
\R
doesn’t match a “single character” (it can match a single character ora pair of charactersmore than one character, see boost’s “Matching Line Endings” section), it doesn’t fall within the allowable escape sequences permitted in the character class.edit: while typing this up, four more posts were made. Hopefully, I still added to the discussion.
edit 2: clarify the\R
-
@PeterJones said:
Hopefully, I still added to the discussion.
You did, and you helped make it an “interesting discussion”. thanks.
-
Alan, the second one that finds no-tab :), works, thank you.
Guy and Peter - Thank you for stepping-in! :) Much appreciated!
Have a nice day!
-
Hi, @glossar, @alan-kilborn, @meta-chuh, @peterjones, and All,
Here is an other solution, which looks for all contents of lines containing, at least ,
2
tabulation chars ( can’t do shorter ! ) :SEARCH
(?-s).*\t.*\t.*
Just for information, an other formulation of the Alan’s regex, which searches lines which do not contain any tabulation char, could be :
SEARCH
(?!.*\t)^.+
Negative character classes are often misunderstood, Indeed ! When you’re using, for instance, the negative class character below :
[^<char1><char2><char3>-<char4>]
It will match ANY Unicode character which is DIFFERENT from, either
<char1>
,<char2>
and all characters between<char3>
and<char4>
included. So, most of the time, it probably matches the\r
and\n
END of Line characters. To avoid matching these line-break chars, just insert\r
and\n
, inside the negative class, at any location, after the^
, except in ranges :[^<char1>\n<char2>\t<char3>-<char4>]
Cheers,
guy038
-
@Alan-Kilborn said:
@glossar said:
regex that locates a line that contains no tab?
There might be better ones, but this one seems to work:
^((?!\t).)*$
Hi @alan-kilborn,
Is it possible for you to modify this regex so shat it should skip blank lines, i.e. the ones containing no characters at all, just (if applicable, ^ and) \r\n. Currently the regex finds blank lines as well since they , too, meet the criteria “no-tab”.Thanks in advance!
-
Hi, @glossar, @alan-kilborn, @meta-chuh, @peterjones, and All,
I may be mistaken but I think that the regex
(?!.*\t)^.+
, of my previous post, just meet your needs, doesn’t it ?Cheers,
guy038
-
@glossar said:
Is it possible for you to modify this regex so shat it should skip blank lines
So we should look at what the original means:
^((?!\t).)*$
It says (basically) to match zero or more occurrences (because of the use of
*
) of anything that is not TAB. If we change it to match ONE or more occurrences (we’re going to change*
to+
to do this) of anything that is not TAB). Because we have to match at least ONE thing, empty/blank lines are no longer matched:^((?!\t).)+$
Which is basically what @guy038 said, but I wanted to elaborate a bit!
-
Hi, @glossar, @alan-kilborn, @meta-chuh, @peterjones and All,
Fundamentally, the new Alan’s solution and mine give the same right results, i.e. to match any non-empty line which does not contain a tabulation character !
By the way, we, both, forget to add the leading in-line-modifier
(?-s)
to be sure that, even you previously ticked the. matches newline
option, the regex engine will suppose that any.
char does match a single standard character, only !So, our two solutions should be :
Alan :
(?-s)^((?!\t).)+$
Guy :
(?-s)(?!.*\t)^.+
However, note that the logic, underlying these
2
regular expressions, is a bit different :-
In the Alan’s regex, from beginning of line (
^
), the regex engine matches for one or more standard characters, till the end of line ($
), ONLY IF each standard character encountered is not a tabulation character, due to the negative look-ahead(?!\t)
, located right before the.
regex character -
In the Guy’s regex, the regex engine matches for all the standard characters of a line, (
^.+
), ONLY IF ( implicitly at beginning of line ) it cannot find a tabulation character further on, at any position of current line, due to the negative look-ahead(?!.*\t)
I did a test with a file of
2,500,000
lines, half of which contained 1 tabulation character and, clearly, the Alan’s version is faster ! (2 mn 15 s
for Alan instead of5mn
for my version )BR
guy038
-