• Login
Community
  • Login

How to find two or more non-consecutive tabs in a line?

Scheduled Pinned Locked Moved General Discussion
21 Posts 5 Posters 4.5k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G
    glossar
    last edited by Apr 1, 2019, 12:19 PM

    Hi all,
    How can I find a line that contains 2 or more non-consecutive tabs? One of the tabs may or may not be at the beginning and/or at the end of the line.

    I tried to adopt a regex for a similar task, but with no success: " ^.(\t){2,}.\r\n ", "^.(?:\t*){2,}\r\n" ,

    Thanks in advance!

    A 1 Reply Last reply Apr 1, 2019, 12:49 PM Reply Quote 1
    • A
      Alan Kilborn @glossar
      last edited by Apr 1, 2019, 12:49 PM

      @glossar

      How about this?:

      (?-s)^.*?\t[^\t]+\t.*?$

      A 1 Reply Last reply Apr 1, 2019, 1:18 PM Reply Quote 3
      • G
        glossar
        last edited by Apr 1, 2019, 1:14 PM

        Hi Alan,

        Thank you but sadly it won’t work. It finds only two tabs, each in every other line, at least in my file, whereas it should locate a line that contain 2 or more tabs in it. (e.g.: blah [tab] blah blah more blah [tab] (blah blah [tab] blah)… ).

        A 1 Reply Last reply Apr 1, 2019, 1:28 PM Reply Quote 1
        • A
          Alan Kilborn @Alan Kilborn
          last edited by Apr 1, 2019, 1:18 PM

          This raises maybe an interesting discussion: When are characters inside a character class notation, which means inside [ and ] non literal? On first crafting the above regex, I thought, this isn’t going to work, it is going to look for \ or t separately, not “tab” characters. But lo and behold, it does look for tabs. What are the rules for this?

          I know that [\R] will match \ or R and not match \R but that may be a special case and invalid because it can match possibly 2 characters, not just one.

          But there must be some general rules on what is special inside […] and [^…] … besides the “specialness” of - when used as a ranger, example [a-z] and the special way needed to get ] to be included in the set…

          1 Reply Last reply Reply Quote 1
          • A
            Alan Kilborn @glossar
            last edited by Apr 1, 2019, 1:28 PM

            @glossar said:

            Thank you but sadly it won’t work.

            Hmmm. Works for me with a Mark operation shown here:

            Imgur

            I copied your text from this thread, did a regex replace on it for \[tab\] with \t…and then applied the regex specified earlier to redmark the text.

            1 Reply Last reply Reply Quote 2
            • G
              glossar
              last edited by Apr 1, 2019, 1:41 PM

              I can confirm that it finds a line that contains two tabs but if a line doesn’t meet the criteria, it looks further (greedy, you say? :) )and hence finds the following line together, which in the end looks like “every other line”. But I’m pretty sure it skips the \r\n.of a line if this line contains only one tab. Can you limit the regex, so it should look for and within only one line (by line, I mean anything between ^ and \r\n).

              A M 2 Replies Last reply Apr 1, 2019, 1:46 PM Reply Quote 3
              • A
                Alan Kilborn @glossar
                last edited by Alan Kilborn Apr 1, 2019, 1:46 PM Apr 1, 2019, 1:46 PM

                @glossar

                Ah, yes, okay, that makes sense. The [^\t]+ will capture across line-boundaries. At this point I will bow out and let the regex master @guy038 step in… :)

                And maybe he can comment on my “interesting disussion” post above as well.

                1 Reply Last reply Reply Quote 2
                • M
                  Meta Chuh moderator @glossar
                  last edited by Meta Chuh Apr 1, 2019, 1:48 PM Apr 1, 2019, 1:46 PM

                  maybe a screenshot helps:
                  Imgur

                  1 Reply Last reply Reply Quote 2
                  • G
                    glossar
                    last edited by Apr 1, 2019, 1:54 PM

                    I can’t see the screenshots above - neither on this page nor when clicking on it. All I see is a broken-image-file-icon and “Imgur” next to it.

                    1 Reply Last reply Reply Quote 0
                    • A
                      Alan Kilborn
                      last edited by Apr 1, 2019, 1:54 PM

                      Okay, one more try. It could be as simple(!) as changing it to this:

                      (?-s)^.*?\t(?!\t).+?\t.*?$

                      :)

                      1 Reply Last reply Reply Quote 3
                      • G
                        glossar
                        last edited by Apr 1, 2019, 1:56 PM

                        Thanks, that now works like a charm! :)

                        While we are at it, how about building another regex that locates a line that contains no tab? :)

                        A 1 Reply Last reply Apr 1, 2019, 2:01 PM Reply Quote 0
                        • A
                          Alan Kilborn @glossar
                          last edited by Apr 1, 2019, 2:01 PM

                          @glossar said:

                          regex that locates a line that contains no tab?

                          There might be better ones, but this one seems to work:

                          ^((?!\t).)*$

                          G 1 Reply Last reply Apr 7, 2019, 8:58 AM Reply Quote 3
                          • G
                            guy038
                            last edited by guy038 Nov 19, 2022, 2:36 AM Apr 1, 2019, 2:01 PM

                            Hi, @glossar, @alan-kilborn, and All,

                            A second solution could be :

                            SEARCH (?-s)(?=.*\t.*\t).+

                            A third solution could be, using the Mark dialog, w/o checking the Bookmark line option :

                            MARK (?-s)\t.*\t


                            Note, @alan-kilborn, that your regex should be changed into :

                            SEARCH (?-s)^.*?\t[^\t\r\n]+\t.*?$

                            To avoid wrong multi-lines match. However, this solution still misses some possibilities !


                            You may test these 3 regexes, above, against the sample test, below :

                            ---------------------------- 1 TEXT block without TAB -----> KO <----- ( because NO tabulation )
                            abcd
                            ---------------------------- 1 TAB  without TEXT ----------> KO <----- ( because ONE tabulation ONLY )
                            	
                            ---------------------------- 2 TABs without TEXT ----------- OK ------
                            		
                            ---------------------------- 3 TABs without TEXT ----------- OK ------
                            			
                            ---------------------------- 1 TAB  + 1 TEXT block --------> KO <----- ( because ONE tabulation ONLY )
                            abcd	
                            	abcd
                            ---------------------------- 1 TAB  + 2 TEXT blocks -------> KO <----- ( because ONE tabulation ONLY )
                            abcd	efgh
                            ---------------------------- 2 TABs + 1 TEXT block --------- OK ------
                            efgh		
                            	efgh	
                            		efgh
                            ---------------------------- 2 TABs + 2 TEXT blocks -------- OK ------
                            abcd	efgh	
                            abcd		ijkm
                            	efgh	ijkl
                            ---------------------------- 2 TABs + 3 TEXT blocks -------- OK ------
                            abcd	efgh	ijkl
                            ---------------------------- 3 TABs + 1 Text block --------- OK ------
                            abcd			
                            	efgh		
                            		ijkl	
                            			mnop
                            ---------------------------- 3 TABs + 2 Text blocks -------- OK ------
                            abcd	efgh		
                            abcd		ijkl	
                            abcd			monp
                            	efgh	ijkl	
                            	efgh		monp
                            		ijkl	monp
                            ---------------------------- 3 TABs + 3 Text blocks -------- OK ------
                            abcd	efgh	ijkm	
                            	efgh	ijkl	mnop
                            ---------------------------- 3 TABs + 4 Text blocks -------- OK ------
                            abcd	efgh	ijkl	mnop
                            

                            Best Regards,

                            guy038

                            1 Reply Last reply Reply Quote 3
                            • PeterJonesP
                              PeterJones
                              last edited by PeterJones Apr 1, 2019, 2:06 PM Apr 1, 2019, 2:03 PM

                              @glossar , @Alan-Kilborn , @Meta-Chuh , et alia,

                              Unfortunately, the (?-s) only changes the behavior of . with respect to newlines; it doesn’t change character classes, so [^\t]+ means “one or more characters that don’t match a TAB, even if those characters are newlines”. By changing the full regex to (?-s)^.*?\t[^\t\r\n]+\t.*?$, I was able to get it to skip lines like @Meta-Chuh 's example of x instead of the TAB. The class [^\t\r\n] means “match one or more characters that isn’t any of TAB, CR (carriage return), or LF (line-feed)”

                              I am not as regex expert as @guy038, so I may be misinterpreting; however, the boost docs say (emphasis mine)

                              Escaped Characters
                              All the escape sequences that match a single character, or a single character class are permitted within a character class definition. For example [[]] would match either of [ or ] while [\W\d] would match any character that is either a “digit”, or is not a “word” character.

                              Since \R doesn’t match a “single character” (it can match a single character or a pair of characters more than one character, see boost’s “Matching Line Endings” section ), it doesn’t fall within the allowable escape sequences permitted in the character class.

                              edit: while typing this up, four more posts were made. Hopefully, I still added to the discussion.
                              edit 2: clarify the \R

                              A 1 Reply Last reply Apr 1, 2019, 2:06 PM Reply Quote 4
                              • A
                                Alan Kilborn @PeterJones
                                last edited by Apr 1, 2019, 2:06 PM

                                @PeterJones said:

                                Hopefully, I still added to the discussion.

                                You did, and you helped make it an “interesting discussion”. thanks.

                                1 Reply Last reply Reply Quote 1
                                • G
                                  glossar
                                  last edited by Apr 1, 2019, 2:06 PM

                                  Alan, the second one that finds no-tab :), works, thank you.

                                  Guy and Peter - Thank you for stepping-in! :) Much appreciated!

                                  Have a nice day!

                                  1 Reply Last reply Reply Quote 3
                                  • G
                                    guy038
                                    last edited by guy038 Apr 1, 2019, 3:38 PM Apr 1, 2019, 2:52 PM

                                    Hi, @glossar, @alan-kilborn, @meta-chuh, @peterjones, and All,

                                    Here is an other solution, which looks for all contents of lines containing, at least , 2 tabulation chars ( can’t do shorter ! ) :

                                    SEARCH (?-s).*\t.*\t.*

                                    Just for information, an other formulation of the Alan’s regex, which searches lines which do not contain any tabulation char, could be :

                                    SEARCH (?!.*\t)^.+


                                    Negative character classes are often misunderstood, Indeed ! When you’re using, for instance, the negative class character below :

                                    [^<char1><char2><char3>-<char4>]

                                    It will match ANY Unicode character which is DIFFERENT from, either <char1>, <char2> and all characters between <char3> and <char4> included. So, most of the time, it probably matches the \r and \n END of Line characters. To avoid matching these line-break chars, just insert \r and \n, inside the negative class, at any location, after the ^, except in ranges :

                                    [^<char1>\n<char2>\t<char3>-<char4>]

                                    Cheers,

                                    guy038

                                    1 Reply Last reply Reply Quote 3
                                    • G
                                      glossar @Alan Kilborn
                                      last edited by glossar Apr 7, 2019, 9:00 AM Apr 7, 2019, 8:58 AM

                                      @Alan-Kilborn said:

                                      @glossar said:

                                      regex that locates a line that contains no tab?

                                      There might be better ones, but this one seems to work:

                                      ^((?!\t).)*$

                                      Hi @alan-kilborn,
                                      Is it possible for you to modify this regex so shat it should skip blank lines, i.e. the ones containing no characters at all, just (if applicable, ^ and) \r\n. Currently the regex finds blank lines as well since they , too, meet the criteria “no-tab”.

                                      Thanks in advance!

                                      A 1 Reply Last reply Apr 8, 2019, 12:38 PM Reply Quote 0
                                      • G
                                        guy038
                                        last edited by guy038 Apr 7, 2019, 10:25 PM Apr 7, 2019, 7:30 PM

                                        Hi, @glossar, @alan-kilborn, @meta-chuh, @peterjones, and All,

                                        I may be mistaken but I think that the regex (?!.*\t)^.+, of my previous post, just meet your needs, doesn’t it ?

                                        Cheers,

                                        guy038

                                        1 Reply Last reply Reply Quote 4
                                        • A
                                          Alan Kilborn @glossar
                                          last edited by Apr 8, 2019, 12:38 PM

                                          @glossar said:

                                          Is it possible for you to modify this regex so shat it should skip blank lines

                                          So we should look at what the original means:

                                          ^((?!\t).)*$

                                          It says (basically) to match zero or more occurrences (because of the use of *) of anything that is not TAB. If we change it to match ONE or more occurrences (we’re going to change * to + to do this) of anything that is not TAB). Because we have to match at least ONE thing, empty/blank lines are no longer matched:

                                          ^((?!\t).)+$

                                          Which is basically what @guy038 said, but I wanted to elaborate a bit!

                                          1 Reply Last reply Reply Quote 2
                                          1 out of 21
                                          • First post
                                            1/21
                                            Last post
                                          The Community of users of the Notepad++ text editor.
                                          Powered by NodeBB | Contributors