Community
    • Login

    How to find two or more non-consecutive tabs in a line?

    Scheduled Pinned Locked Moved General Discussion
    21 Posts 5 Posters 4.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • glossarG
      glossar
      last edited by

      Hi all,
      How can I find a line that contains 2 or more non-consecutive tabs? One of the tabs may or may not be at the beginning and/or at the end of the line.

      I tried to adopt a regex for a similar task, but with no success: " ^.(\t){2,}.\r\n ", "^.(?:\t*){2,}\r\n" ,

      Thanks in advance!

      Alan KilbornA 1 Reply Last reply Reply Quote 1
      • Alan KilbornA
        Alan Kilborn @glossar
        last edited by

        @glossar

        How about this?:

        (?-s)^.*?\t[^\t]+\t.*?$

        Alan KilbornA 1 Reply Last reply Reply Quote 3
        • glossarG
          glossar
          last edited by

          Hi Alan,

          Thank you but sadly it won’t work. It finds only two tabs, each in every other line, at least in my file, whereas it should locate a line that contain 2 or more tabs in it. (e.g.: blah [tab] blah blah more blah [tab] (blah blah [tab] blah)… ).

          Alan KilbornA 1 Reply Last reply Reply Quote 1
          • Alan KilbornA
            Alan Kilborn @Alan Kilborn
            last edited by

            This raises maybe an interesting discussion: When are characters inside a character class notation, which means inside [ and ] non literal? On first crafting the above regex, I thought, this isn’t going to work, it is going to look for \ or t separately, not “tab” characters. But lo and behold, it does look for tabs. What are the rules for this?

            I know that [\R] will match \ or R and not match \R but that may be a special case and invalid because it can match possibly 2 characters, not just one.

            But there must be some general rules on what is special inside […] and [^…] … besides the “specialness” of - when used as a ranger, example [a-z] and the special way needed to get ] to be included in the set…

            1 Reply Last reply Reply Quote 1
            • Alan KilbornA
              Alan Kilborn @glossar
              last edited by

              @glossar said:

              Thank you but sadly it won’t work.

              Hmmm. Works for me with a Mark operation shown here:

              Imgur

              I copied your text from this thread, did a regex replace on it for \[tab\] with \t…and then applied the regex specified earlier to redmark the text.

              1 Reply Last reply Reply Quote 2
              • glossarG
                glossar
                last edited by

                I can confirm that it finds a line that contains two tabs but if a line doesn’t meet the criteria, it looks further (greedy, you say? :) )and hence finds the following line together, which in the end looks like “every other line”. But I’m pretty sure it skips the \r\n.of a line if this line contains only one tab. Can you limit the regex, so it should look for and within only one line (by line, I mean anything between ^ and \r\n).

                Alan KilbornA Meta ChuhM 2 Replies Last reply Reply Quote 3
                • Alan KilbornA
                  Alan Kilborn @glossar
                  last edited by Alan Kilborn

                  @glossar

                  Ah, yes, okay, that makes sense. The [^\t]+ will capture across line-boundaries. At this point I will bow out and let the regex master @guy038 step in… :)

                  And maybe he can comment on my “interesting disussion” post above as well.

                  1 Reply Last reply Reply Quote 2
                  • Meta ChuhM
                    Meta Chuh moderator @glossar
                    last edited by Meta Chuh

                    maybe a screenshot helps:
                    Imgur

                    1 Reply Last reply Reply Quote 2
                    • glossarG
                      glossar
                      last edited by

                      I can’t see the screenshots above - neither on this page nor when clicking on it. All I see is a broken-image-file-icon and “Imgur” next to it.

                      1 Reply Last reply Reply Quote 0
                      • Alan KilbornA
                        Alan Kilborn
                        last edited by

                        Okay, one more try. It could be as simple(!) as changing it to this:

                        (?-s)^.*?\t(?!\t).+?\t.*?$

                        :)

                        1 Reply Last reply Reply Quote 3
                        • glossarG
                          glossar
                          last edited by

                          Thanks, that now works like a charm! :)

                          While we are at it, how about building another regex that locates a line that contains no tab? :)

                          Alan KilbornA 1 Reply Last reply Reply Quote 0
                          • Alan KilbornA
                            Alan Kilborn @glossar
                            last edited by

                            @glossar said:

                            regex that locates a line that contains no tab?

                            There might be better ones, but this one seems to work:

                            ^((?!\t).)*$

                            glossarG 1 Reply Last reply Reply Quote 3
                            • guy038G
                              guy038
                              last edited by guy038

                              Hi, @glossar, @alan-kilborn, and All,

                              A second solution could be :

                              SEARCH (?-s)(?=.*\t.*\t).+

                              A third solution could be, using the Mark dialog, w/o checking the Bookmark line option :

                              MARK (?-s)\t.*\t


                              Note, @alan-kilborn, that your regex should be changed into :

                              SEARCH (?-s)^.*?\t[^\t\r\n]+\t.*?$

                              To avoid wrong multi-lines match. However, this solution still misses some possibilities !


                              You may test these 3 regexes, above, against the sample test, below :

                              ---------------------------- 1 TEXT block without TAB -----> KO <----- ( because NO tabulation )
                              abcd
                              ---------------------------- 1 TAB  without TEXT ----------> KO <----- ( because ONE tabulation ONLY )
                              	
                              ---------------------------- 2 TABs without TEXT ----------- OK ------
                              		
                              ---------------------------- 3 TABs without TEXT ----------- OK ------
                              			
                              ---------------------------- 1 TAB  + 1 TEXT block --------> KO <----- ( because ONE tabulation ONLY )
                              abcd	
                              	abcd
                              ---------------------------- 1 TAB  + 2 TEXT blocks -------> KO <----- ( because ONE tabulation ONLY )
                              abcd	efgh
                              ---------------------------- 2 TABs + 1 TEXT block --------- OK ------
                              efgh		
                              	efgh	
                              		efgh
                              ---------------------------- 2 TABs + 2 TEXT blocks -------- OK ------
                              abcd	efgh	
                              abcd		ijkm
                              	efgh	ijkl
                              ---------------------------- 2 TABs + 3 TEXT blocks -------- OK ------
                              abcd	efgh	ijkl
                              ---------------------------- 3 TABs + 1 Text block --------- OK ------
                              abcd			
                              	efgh		
                              		ijkl	
                              			mnop
                              ---------------------------- 3 TABs + 2 Text blocks -------- OK ------
                              abcd	efgh		
                              abcd		ijkl	
                              abcd			monp
                              	efgh	ijkl	
                              	efgh		monp
                              		ijkl	monp
                              ---------------------------- 3 TABs + 3 Text blocks -------- OK ------
                              abcd	efgh	ijkm	
                              	efgh	ijkl	mnop
                              ---------------------------- 3 TABs + 4 Text blocks -------- OK ------
                              abcd	efgh	ijkl	mnop
                              

                              Best Regards,

                              guy038

                              1 Reply Last reply Reply Quote 3
                              • PeterJonesP
                                PeterJones
                                last edited by PeterJones

                                @glossar , @Alan-Kilborn , @Meta-Chuh , et alia,

                                Unfortunately, the (?-s) only changes the behavior of . with respect to newlines; it doesn’t change character classes, so [^\t]+ means “one or more characters that don’t match a TAB, even if those characters are newlines”. By changing the full regex to (?-s)^.*?\t[^\t\r\n]+\t.*?$, I was able to get it to skip lines like @Meta-Chuh 's example of x instead of the TAB. The class [^\t\r\n] means “match one or more characters that isn’t any of TAB, CR (carriage return), or LF (line-feed)”

                                I am not as regex expert as @guy038, so I may be misinterpreting; however, the boost docs say (emphasis mine)

                                Escaped Characters
                                All the escape sequences that match a single character, or a single character class are permitted within a character class definition. For example [[]] would match either of [ or ] while [\W\d] would match any character that is either a “digit”, or is not a “word” character.

                                Since \R doesn’t match a “single character” (it can match a single character or a pair of characters more than one character, see boost’s “Matching Line Endings” section), it doesn’t fall within the allowable escape sequences permitted in the character class.

                                edit: while typing this up, four more posts were made. Hopefully, I still added to the discussion.
                                edit 2: clarify the \R

                                Alan KilbornA 1 Reply Last reply Reply Quote 4
                                • Alan KilbornA
                                  Alan Kilborn @PeterJones
                                  last edited by

                                  @PeterJones said:

                                  Hopefully, I still added to the discussion.

                                  You did, and you helped make it an “interesting discussion”. thanks.

                                  1 Reply Last reply Reply Quote 1
                                  • glossarG
                                    glossar
                                    last edited by

                                    Alan, the second one that finds no-tab :), works, thank you.

                                    Guy and Peter - Thank you for stepping-in! :) Much appreciated!

                                    Have a nice day!

                                    1 Reply Last reply Reply Quote 3
                                    • guy038G
                                      guy038
                                      last edited by guy038

                                      Hi, @glossar, @alan-kilborn, @meta-chuh, @peterjones, and All,

                                      Here is an other solution, which looks for all contents of lines containing, at least , 2 tabulation chars ( can’t do shorter ! ) :

                                      SEARCH (?-s).*\t.*\t.*

                                      Just for information, an other formulation of the Alan’s regex, which searches lines which do not contain any tabulation char, could be :

                                      SEARCH (?!.*\t)^.+


                                      Negative character classes are often misunderstood, Indeed ! When you’re using, for instance, the negative class character below :

                                      [^<char1><char2><char3>-<char4>]

                                      It will match ANY Unicode character which is DIFFERENT from, either <char1>, <char2> and all characters between <char3> and <char4> included. So, most of the time, it probably matches the \r and \n END of Line characters. To avoid matching these line-break chars, just insert \r and \n, inside the negative class, at any location, after the ^, except in ranges :

                                      [^<char1>\n<char2>\t<char3>-<char4>]

                                      Cheers,

                                      guy038

                                      1 Reply Last reply Reply Quote 3
                                      • glossarG
                                        glossar @Alan Kilborn
                                        last edited by glossar

                                        @Alan-Kilborn said:

                                        @glossar said:

                                        regex that locates a line that contains no tab?

                                        There might be better ones, but this one seems to work:

                                        ^((?!\t).)*$

                                        Hi @alan-kilborn,
                                        Is it possible for you to modify this regex so shat it should skip blank lines, i.e. the ones containing no characters at all, just (if applicable, ^ and) \r\n. Currently the regex finds blank lines as well since they , too, meet the criteria “no-tab”.

                                        Thanks in advance!

                                        Alan KilbornA 1 Reply Last reply Reply Quote 0
                                        • guy038G
                                          guy038
                                          last edited by guy038

                                          Hi, @glossar, @alan-kilborn, @meta-chuh, @peterjones, and All,

                                          I may be mistaken but I think that the regex (?!.*\t)^.+, of my previous post, just meet your needs, doesn’t it ?

                                          Cheers,

                                          guy038

                                          1 Reply Last reply Reply Quote 4
                                          • Alan KilbornA
                                            Alan Kilborn @glossar
                                            last edited by

                                            @glossar said:

                                            Is it possible for you to modify this regex so shat it should skip blank lines

                                            So we should look at what the original means:

                                            ^((?!\t).)*$

                                            It says (basically) to match zero or more occurrences (because of the use of *) of anything that is not TAB. If we change it to match ONE or more occurrences (we’re going to change * to + to do this) of anything that is not TAB). Because we have to match at least ONE thing, empty/blank lines are no longer matched:

                                            ^((?!\t).)+$

                                            Which is basically what @guy038 said, but I wanted to elaborate a bit!

                                            1 Reply Last reply Reply Quote 2
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors