Community
    • Login

    How to find two or more non-consecutive tabs in a line?

    Scheduled Pinned Locked Moved General Discussion
    21 Posts 5 Posters 4.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Meta ChuhM
      Meta Chuh moderator @glossar
      last edited by Meta Chuh

      maybe a screenshot helps:
      Imgur

      1 Reply Last reply Reply Quote 2
      • glossarG
        glossar
        last edited by

        I can’t see the screenshots above - neither on this page nor when clicking on it. All I see is a broken-image-file-icon and “Imgur” next to it.

        1 Reply Last reply Reply Quote 0
        • Alan KilbornA
          Alan Kilborn
          last edited by

          Okay, one more try. It could be as simple(!) as changing it to this:

          (?-s)^.*?\t(?!\t).+?\t.*?$

          :)

          1 Reply Last reply Reply Quote 3
          • glossarG
            glossar
            last edited by

            Thanks, that now works like a charm! :)

            While we are at it, how about building another regex that locates a line that contains no tab? :)

            Alan KilbornA 1 Reply Last reply Reply Quote 0
            • Alan KilbornA
              Alan Kilborn @glossar
              last edited by

              @glossar said:

              regex that locates a line that contains no tab?

              There might be better ones, but this one seems to work:

              ^((?!\t).)*$

              glossarG 1 Reply Last reply Reply Quote 3
              • guy038G
                guy038
                last edited by guy038

                Hi, @glossar, @alan-kilborn, and All,

                A second solution could be :

                SEARCH (?-s)(?=.*\t.*\t).+

                A third solution could be, using the Mark dialog, w/o checking the Bookmark line option :

                MARK (?-s)\t.*\t


                Note, @alan-kilborn, that your regex should be changed into :

                SEARCH (?-s)^.*?\t[^\t\r\n]+\t.*?$

                To avoid wrong multi-lines match. However, this solution still misses some possibilities !


                You may test these 3 regexes, above, against the sample test, below :

                ---------------------------- 1 TEXT block without TAB -----> KO <----- ( because NO tabulation )
                abcd
                ---------------------------- 1 TAB  without TEXT ----------> KO <----- ( because ONE tabulation ONLY )
                	
                ---------------------------- 2 TABs without TEXT ----------- OK ------
                		
                ---------------------------- 3 TABs without TEXT ----------- OK ------
                			
                ---------------------------- 1 TAB  + 1 TEXT block --------> KO <----- ( because ONE tabulation ONLY )
                abcd	
                	abcd
                ---------------------------- 1 TAB  + 2 TEXT blocks -------> KO <----- ( because ONE tabulation ONLY )
                abcd	efgh
                ---------------------------- 2 TABs + 1 TEXT block --------- OK ------
                efgh		
                	efgh	
                		efgh
                ---------------------------- 2 TABs + 2 TEXT blocks -------- OK ------
                abcd	efgh	
                abcd		ijkm
                	efgh	ijkl
                ---------------------------- 2 TABs + 3 TEXT blocks -------- OK ------
                abcd	efgh	ijkl
                ---------------------------- 3 TABs + 1 Text block --------- OK ------
                abcd			
                	efgh		
                		ijkl	
                			mnop
                ---------------------------- 3 TABs + 2 Text blocks -------- OK ------
                abcd	efgh		
                abcd		ijkl	
                abcd			monp
                	efgh	ijkl	
                	efgh		monp
                		ijkl	monp
                ---------------------------- 3 TABs + 3 Text blocks -------- OK ------
                abcd	efgh	ijkm	
                	efgh	ijkl	mnop
                ---------------------------- 3 TABs + 4 Text blocks -------- OK ------
                abcd	efgh	ijkl	mnop
                

                Best Regards,

                guy038

                1 Reply Last reply Reply Quote 3
                • PeterJonesP
                  PeterJones
                  last edited by PeterJones

                  @glossar , @Alan-Kilborn , @Meta-Chuh , et alia,

                  Unfortunately, the (?-s) only changes the behavior of . with respect to newlines; it doesn’t change character classes, so [^\t]+ means “one or more characters that don’t match a TAB, even if those characters are newlines”. By changing the full regex to (?-s)^.*?\t[^\t\r\n]+\t.*?$, I was able to get it to skip lines like @Meta-Chuh 's example of x instead of the TAB. The class [^\t\r\n] means “match one or more characters that isn’t any of TAB, CR (carriage return), or LF (line-feed)”

                  I am not as regex expert as @guy038, so I may be misinterpreting; however, the boost docs say (emphasis mine)

                  Escaped Characters
                  All the escape sequences that match a single character, or a single character class are permitted within a character class definition. For example [[]] would match either of [ or ] while [\W\d] would match any character that is either a “digit”, or is not a “word” character.

                  Since \R doesn’t match a “single character” (it can match a single character or a pair of characters more than one character, see boost’s “Matching Line Endings” section), it doesn’t fall within the allowable escape sequences permitted in the character class.

                  edit: while typing this up, four more posts were made. Hopefully, I still added to the discussion.
                  edit 2: clarify the \R

                  Alan KilbornA 1 Reply Last reply Reply Quote 4
                  • Alan KilbornA
                    Alan Kilborn @PeterJones
                    last edited by

                    @PeterJones said:

                    Hopefully, I still added to the discussion.

                    You did, and you helped make it an “interesting discussion”. thanks.

                    1 Reply Last reply Reply Quote 1
                    • glossarG
                      glossar
                      last edited by

                      Alan, the second one that finds no-tab :), works, thank you.

                      Guy and Peter - Thank you for stepping-in! :) Much appreciated!

                      Have a nice day!

                      1 Reply Last reply Reply Quote 3
                      • guy038G
                        guy038
                        last edited by guy038

                        Hi, @glossar, @alan-kilborn, @meta-chuh, @peterjones, and All,

                        Here is an other solution, which looks for all contents of lines containing, at least , 2 tabulation chars ( can’t do shorter ! ) :

                        SEARCH (?-s).*\t.*\t.*

                        Just for information, an other formulation of the Alan’s regex, which searches lines which do not contain any tabulation char, could be :

                        SEARCH (?!.*\t)^.+


                        Negative character classes are often misunderstood, Indeed ! When you’re using, for instance, the negative class character below :

                        [^<char1><char2><char3>-<char4>]

                        It will match ANY Unicode character which is DIFFERENT from, either <char1>, <char2> and all characters between <char3> and <char4> included. So, most of the time, it probably matches the \r and \n END of Line characters. To avoid matching these line-break chars, just insert \r and \n, inside the negative class, at any location, after the ^, except in ranges :

                        [^<char1>\n<char2>\t<char3>-<char4>]

                        Cheers,

                        guy038

                        1 Reply Last reply Reply Quote 3
                        • glossarG
                          glossar @Alan Kilborn
                          last edited by glossar

                          @Alan-Kilborn said:

                          @glossar said:

                          regex that locates a line that contains no tab?

                          There might be better ones, but this one seems to work:

                          ^((?!\t).)*$

                          Hi @alan-kilborn,
                          Is it possible for you to modify this regex so shat it should skip blank lines, i.e. the ones containing no characters at all, just (if applicable, ^ and) \r\n. Currently the regex finds blank lines as well since they , too, meet the criteria “no-tab”.

                          Thanks in advance!

                          Alan KilbornA 1 Reply Last reply Reply Quote 0
                          • guy038G
                            guy038
                            last edited by guy038

                            Hi, @glossar, @alan-kilborn, @meta-chuh, @peterjones, and All,

                            I may be mistaken but I think that the regex (?!.*\t)^.+, of my previous post, just meet your needs, doesn’t it ?

                            Cheers,

                            guy038

                            1 Reply Last reply Reply Quote 4
                            • Alan KilbornA
                              Alan Kilborn @glossar
                              last edited by

                              @glossar said:

                              Is it possible for you to modify this regex so shat it should skip blank lines

                              So we should look at what the original means:

                              ^((?!\t).)*$

                              It says (basically) to match zero or more occurrences (because of the use of *) of anything that is not TAB. If we change it to match ONE or more occurrences (we’re going to change * to + to do this) of anything that is not TAB). Because we have to match at least ONE thing, empty/blank lines are no longer matched:

                              ^((?!\t).)+$

                              Which is basically what @guy038 said, but I wanted to elaborate a bit!

                              1 Reply Last reply Reply Quote 2
                              • guy038G
                                guy038
                                last edited by

                                Hi, @glossar, @alan-kilborn, @meta-chuh, @peterjones and All,

                                Fundamentally, the new Alan’s solution and mine give the same right results, i.e. to match any non-empty line which does not contain a tabulation character !

                                By the way, we, both, forget to add the leading in-line-modifier (?-s) to be sure that, even you previously ticked the . matches newline option, the regex engine will suppose that any . char does match a single standard character, only !

                                So, our two solutions should be :

                                Alan : (?-s)^((?!\t).)+$

                                Guy : (?-s)(?!.*\t)^.+


                                However, note that the logic, underlying these 2 regular expressions, is a bit different :

                                • In the Alan’s regex, from beginning of line ( ^ ), the regex engine matches for one or more standard characters, till the end of line ( $ ), ONLY IF each standard character encountered is not a tabulation character, due to the negative look-ahead (?!\t), located right before the . regex character

                                • In the Guy’s regex, the regex engine matches for all the standard characters of a line, ( ^.+ ), ONLY IF ( implicitly at beginning of line ) it cannot find a tabulation character further on, at any position of current line, due to the negative look-ahead (?!.*\t)

                                I did a test with a file of 2,500,000 lines, half of which contained 1 tabulation character and, clearly, the Alan’s version is faster ! ( 2 mn 15 s for Alan instead of 5mn for my version )

                                BR

                                guy038

                                1 Reply Last reply Reply Quote 2
                                • First post
                                  Last post
                                The Community of users of the Notepad++ text editor.
                                Powered by NodeBB | Contributors