Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    How to find two or more non-consecutive tabs in a line?

    General Discussion
    5
    21
    2019
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • glossar
      glossar last edited by

      Hi all,
      How can I find a line that contains 2 or more non-consecutive tabs? One of the tabs may or may not be at the beginning and/or at the end of the line.

      I tried to adopt a regex for a similar task, but with no success: " ^.(\t){2,}.\r\n ", "^.(?:\t*){2,}\r\n" ,

      Thanks in advance!

      Alan Kilborn 1 Reply Last reply Reply Quote 1
      • Alan Kilborn
        Alan Kilborn @glossar last edited by

        @glossar

        How about this?:

        (?-s)^.*?\t[^\t]+\t.*?$

        Alan Kilborn 1 Reply Last reply Reply Quote 3
        • glossar
          glossar last edited by

          Hi Alan,

          Thank you but sadly it won’t work. It finds only two tabs, each in every other line, at least in my file, whereas it should locate a line that contain 2 or more tabs in it. (e.g.: blah [tab] blah blah more blah [tab] (blah blah [tab] blah)… ).

          Alan Kilborn 1 Reply Last reply Reply Quote 1
          • Alan Kilborn
            Alan Kilborn @Alan Kilborn last edited by

            This raises maybe an interesting discussion: When are characters inside a character class notation, which means inside [ and ] non literal? On first crafting the above regex, I thought, this isn’t going to work, it is going to look for \ or t separately, not “tab” characters. But lo and behold, it does look for tabs. What are the rules for this?

            I know that [\R] will match \ or R and not match \R but that may be a special case and invalid because it can match possibly 2 characters, not just one.

            But there must be some general rules on what is special inside […] and [^…] … besides the “specialness” of - when used as a ranger, example [a-z] and the special way needed to get ] to be included in the set…

            1 Reply Last reply Reply Quote 1
            • Alan Kilborn
              Alan Kilborn @glossar last edited by

              @glossar said:

              Thank you but sadly it won’t work.

              Hmmm. Works for me with a Mark operation shown here:

              Imgur

              I copied your text from this thread, did a regex replace on it for [tab] with \t…and then applied the regex specified earlier to redmark the text.

              1 Reply Last reply Reply Quote 2
              • glossar
                glossar last edited by

                I can confirm that it finds a line that contains two tabs but if a line doesn’t meet the criteria, it looks further (greedy, you say? :) )and hence finds the following line together, which in the end looks like “every other line”. But I’m pretty sure it skips the \r\n.of a line if this line contains only one tab. Can you limit the regex, so it should look for and within only one line (by line, I mean anything between ^ and \r\n).

                Alan Kilborn Meta Chuh 2 Replies Last reply Reply Quote 3
                • Alan Kilborn
                  Alan Kilborn @glossar last edited by Alan Kilborn

                  @glossar

                  Ah, yes, okay, that makes sense. The [^\t]+ will capture across line-boundaries. At this point I will bow out and let the regex master @guy038 step in… :)

                  And maybe he can comment on my “interesting disussion” post above as well.

                  1 Reply Last reply Reply Quote 2
                  • Meta Chuh
                    Meta Chuh @glossar last edited by Meta Chuh

                    maybe a screenshot helps:
                    Imgur

                    1 Reply Last reply Reply Quote 2
                    • glossar
                      glossar last edited by

                      I can’t see the screenshots above - neither on this page nor when clicking on it. All I see is a broken-image-file-icon and “Imgur” next to it.

                      1 Reply Last reply Reply Quote 0
                      • Alan Kilborn
                        Alan Kilborn last edited by

                        Okay, one more try. It could be as simple(!) as changing it to this:

                        (?-s)^.*?\t(?!\t).+?\t.*?$

                        :)

                        1 Reply Last reply Reply Quote 4
                        • glossar
                          glossar last edited by

                          Thanks, that now works like a charm! :)

                          While we are at it, how about building another regex that locates a line that contains no tab? :)

                          Alan Kilborn 1 Reply Last reply Reply Quote 0
                          • Alan Kilborn
                            Alan Kilborn @glossar last edited by

                            @glossar said:

                            regex that locates a line that contains no tab?

                            There might be better ones, but this one seems to work:

                            ^((?!\t).)*$

                            glossar 1 Reply Last reply Reply Quote 3
                            • guy038
                              guy038 last edited by guy038

                              Hi, @glossar, @alan-kilborn, and All,

                              A second solution could be :

                              SEARCH (?-s)(?=.*\t.*\t).+

                              A third solution could be, using the Mark dialog, w/o checking the Bookmark line option :

                              MARK (?-s)\t.*\t


                              Note, @alan-kilborn, that your regex should be changed into :

                              SEARCH (?-s)^.*?\t[^\t\r\n]+\t.*?$

                              To avoid wrong multi-lines match. However, this solution still misses some possibilities !


                              You may test these 3 regexes, above, against the sample test, below :

                              ---------------------------- 1 TEXT block without TAB -----> KO <----- ( because NO tabulation )
                              abcd
                              ---------------------------- 1 TAB  without TEXT ----------> KO <----- ( because ONE tabulation ONLY )
                              	
                              ---------------------------- 2 TABs without TEXT ----------- OK ------
                              		
                              ---------------------------- 3 TABs without TEXT ----------- OK ------
                              			
                              ---------------------------- 1 TAB  + 1 TEXT block --------> KO <----- ( because ONE tabulation ONLY )
                              abcd	
                              	abcd
                              ---------------------------- 1 TAB  + 2 TEXT blocks -------> KO <----- ( because ONE tabulation ONLY )
                              abcd	efgh
                              ---------------------------- 2 TABs + 1 TEXT block --------- OK ------
                              efgh		
                              	efgh	
                              		efgh
                              ---------------------------- 2 TABs + 2 TEXT blocks -------- OK ------
                              abcd	efgh	
                              abcd		ijkm
                              	efgh	ijkl
                              ---------------------------- 2 TABs + 3 TEXT blocks -------- OK ------
                              abcd	efgh	ijkl
                              ---------------------------- 3 TABs + 1 Text block --------- OK ------
                              abcd			
                              	efgh		
                              		ijkl	
                              			mnop
                              ---------------------------- 3 TABs + 2 Text blocks -------- OK ------
                              abcd	efgh		
                              abcd		ijkl	
                              abcd			monp
                              	efgh	ijkl	
                              	efgh		monp
                              		ijkl	monp
                              ---------------------------- 3 TABs + 3 Text blocks -------- OK ------
                              abcd	efgh	ijkm	
                              	efgh	ijkl	mnop
                              ---------------------------- 3 TABs + 4 Text blocks -------- OK ------
                              abcd	efgh	ijkl	mnop
                              

                              Best Regards,

                              guy038

                              1 Reply Last reply Reply Quote 3
                              • PeterJones
                                PeterJones last edited by PeterJones

                                @glossar , @Alan-Kilborn , @Meta-Chuh , et alia,

                                Unfortunately, the (?-s) only changes the behavior of . with respect to newlines; it doesn’t change character classes, so [^\t]+ means “one or more characters that don’t match a TAB, even if those characters are newlines”. By changing the full regex to (?-s)^.*?\t[^\t\r\n]+\t.*?$, I was able to get it to skip lines like @Meta-Chuh 's example of x instead of the TAB. The class [^\t\r\n] means “match one or more characters that isn’t any of TAB, CR (carriage return), or LF (line-feed)”

                                I am not as regex expert as @guy038, so I may be misinterpreting; however, the boost docs say (emphasis mine)

                                Escaped Characters
                                All the escape sequences that match a single character, or a single character class are permitted within a character class definition. For example [[]] would match either of [ or ] while [\W\d] would match any character that is either a “digit”, or is not a “word” character.

                                Since \R doesn’t match a “single character” (it can match a single character or a pair of characters more than one character, see boost’s “Matching Line Endings” section), it doesn’t fall within the allowable escape sequences permitted in the character class.

                                edit: while typing this up, four more posts were made. Hopefully, I still added to the discussion.
                                edit 2: clarify the \R

                                Alan Kilborn 1 Reply Last reply Reply Quote 4
                                • Alan Kilborn
                                  Alan Kilborn @PeterJones last edited by

                                  @PeterJones said:

                                  Hopefully, I still added to the discussion.

                                  You did, and you helped make it an “interesting discussion”. thanks.

                                  1 Reply Last reply Reply Quote 1
                                  • glossar
                                    glossar last edited by

                                    Alan, the second one that finds no-tab :), works, thank you.

                                    Guy and Peter - Thank you for stepping-in! :) Much appreciated!

                                    Have a nice day!

                                    1 Reply Last reply Reply Quote 3
                                    • guy038
                                      guy038 last edited by guy038

                                      Hi, @glossar, @alan-kilborn, @meta-chuh, @peterjones, and All,

                                      Here is an other solution, which looks for all contents of lines containing, at least , 2 tabulation chars ( can’t do shorter ! ) :

                                      SEARCH (?-s).*\t.*\t.*

                                      Just for information, an other formulation of the Alan’s regex, which searches lines which do not contain any tabulation char, could be :

                                      SEARCH (?!.*\t)^.+


                                      Negative character classes are often misunderstood, Indeed ! When you’re using, for instance, the negative class character below :

                                      [^<char1><char2><char3>-<char4>]

                                      It will match ANY Unicode character which is DIFFERENT from, either <char1>, <char2> and all characters between <char3> and <char4> included. So, most of the time, it probably matches the \r and \n END of Line characters. To avoid matching these line-break chars, just insert \r and \n, inside the negative class, at any location, after the ^, except in ranges :

                                      [^<char1>\n<char2>\t<char3>-<char4>]

                                      Cheers,

                                      guy038

                                      1 Reply Last reply Reply Quote 3
                                      • glossar
                                        glossar @Alan Kilborn last edited by glossar

                                        @Alan-Kilborn said:

                                        @glossar said:

                                        regex that locates a line that contains no tab?

                                        There might be better ones, but this one seems to work:

                                        ^((?!\t).)*$

                                        Hi @alan-kilborn,
                                        Is it possible for you to modify this regex so shat it should skip blank lines, i.e. the ones containing no characters at all, just (if applicable, ^ and) \r\n. Currently the regex finds blank lines as well since they , too, meet the criteria “no-tab”.

                                        Thanks in advance!

                                        Alan Kilborn 1 Reply Last reply Reply Quote 0
                                        • guy038
                                          guy038 last edited by guy038

                                          Hi, @glossar, @alan-kilborn, @meta-chuh, @peterjones, and All,

                                          I may be mistaken but I think that the regex (?!.*\t)^.+, of my previous post, just meet your needs, doesn’t it ?

                                          Cheers,

                                          guy038

                                          1 Reply Last reply Reply Quote 4
                                          • Alan Kilborn
                                            Alan Kilborn @glossar last edited by

                                            @glossar said:

                                            Is it possible for you to modify this regex so shat it should skip blank lines

                                            So we should look at what the original means:

                                            ^((?!\t).)*$

                                            It says (basically) to match zero or more occurrences (because of the use of *) of anything that is not TAB. If we change it to match ONE or more occurrences (we’re going to change * to + to do this) of anything that is not TAB). Because we have to match at least ONE thing, empty/blank lines are no longer matched:

                                            ^((?!\t).)+$

                                            Which is basically what @guy038 said, but I wanted to elaborate a bit!

                                            1 Reply Last reply Reply Quote 2
                                            • First post
                                              Last post
                                            Copyright © 2014 NodeBB Forums | Contributors