• Login
Community
  • Login

Tabs to spaces

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
6 Posts 2 Posters 32.6k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • T
    The Fartman
    last edited by The Fartman May 30, 2016, 11:52 AM May 30, 2016, 11:51 AM

    Hi.
    For a long time, I used single spaces as indent in my files.
    Now I decided to use tabs instead.
    But I’ve got a problem: when I use “Space to TAB” menu item, Notepad++ only replaces spaces starting from second space in each line. First space in every line is left untouched, even though size of tab is set to “1”.

    Screenshots:
    http://s018.radikal.ru/i514/1605/1b/d28b37060914.png
    http://s019.radikal.ru/i621/1605/b4/152d0a97d0d7.png
    http://s020.radikal.ru/i704/1605/2a/3f6645c8b9dc.png

    This is not what I want. Why Notepad is acting this way? What should I do to replace each and every space character in the beginning of every line to tab?

    P.S. Sorry for my English, I’m not a native speaker…

    1 Reply Last reply Reply Quote 0
    • G
      guy038
      last edited by guy038 May 31, 2016, 10:01 PM May 31, 2016, 9:59 PM

      Hello The Fartman,

      Ah, yes ! You’re right about it ! If you set in Settings - Preferences - Tab Settings, for a given language, the default width of a tabulation character to any number T, > 1, the substitution of spaces to tabs, with the options Edit- Blank operations - Space to TAB …, is almost correct.

      I said “almost”, as I noticed that, when an unique space is located at position, k*T, whatever integer k, it could have been changed into a tabulation character of one-character width !

      But, if you set the tabulation size T to the value 1 and that you select one of the two options Space to TAB …, the arrangement of text is kept by adding any range of tabulations, of one-character long, systematically followed by a last space character ! It looks like a (small) bug :-((

      If you mind about it, just perform the following simple S/R, in extended or regex mode, to change that space into a tabulation character :

      Find what : \x20
      Replace with : \t


      By the way, could you explain to me about the real interest in changing any space by a tabulation of one-character long ? For memory, they, both, belong to the Unicode horizontal blank characters range, which you may search for, with the \h syntax and which matches any of the 3 characters, below :

      • \t or \x09 ( Tabulation or HT )
      • \x20 ( Space or SP )
      • \xa0 ( No-Break Space or NBSP )

      Best Regards,

      guy038

      1 Reply Last reply Reply Quote 0
      • T
        The Fartman
        last edited by The Fartman Jun 1, 2016, 2:58 PM Jun 1, 2016, 2:57 PM

        But, if you set the tabulation size T to the value 1 and that you select one of the two options Space to TAB …, the arrangement of text is kept by adding any range of tabulations, of one-character long, systematically followed by a last space character ! It looks like a (small) bug :-((

        Yeah, and that really bugs me out. Looks like that feature which was supposed to allow users to migrate from spaces indentation to tabs indentation doesn’t work :(
        If it’s actually a bug, I’m looking forward that it will be fixed in next release.

        By the way, could you explain to me about the real interest in changing any space by a tabulation of one-character long ?

        Not sure about your question, but I guess you’re referring to my screenshot where I picked “Space to TAB (All)” instead of “Space to TAB (Leading)”, but in fact, I did it only because “… Leading” menu item didn’t work too, so I just demonstrate that even “… All” doesn’t perform expected actions as well.

        Anyway, I just want to change indentation of my code (not replace every space with tab, indeed): for years I was using single space, and now I want to use tab, and I want Notepad++ to do it for me. Manually re-indent thousands of lines would a little exhausting.

        For now, I use this regular expression: find "^ |\G(?!^) ", replace with “\t” to change indentation, but having that menu item fixed would be better.

        P.S.: This thread should be called “Spaces to tabs” indeed, I mistyped.

        1 Reply Last reply Reply Quote 0
        • G
          guy038
          last edited by guy038 Jun 4, 2016, 8:41 PM Jun 4, 2016, 8:27 PM

          Hi, The Fartman,

          Sorry for my late reply, but we had a hard week , at work !

          I’m afraid there’s no solution, unless the developer, that implemented the TAB to Space and the Space to TAB… features, would give a glance to his code ! So, in the meanwhile, the only way seems to create a specific S/R, as you did :-)

          By the way, to restrict the replacement to the leading spaces, you have found a very clever regex !! Indeed, the syntax \G is rarely used, but can be very interesting in some cases ! In post-scriptum, at the end of that post, I describe a use of the \G syntax, to detect a range of true codons, in an RNA sequence :-))


          For the record, the \G form is an assertion, which represents any of 3 locations, below :

          • The very beginning of a file
          • The location of cursor after the previous match of the regex engine
          • The location of cursor, previously moved, on purpose, by the user

          However, I think that we would better to search for any horizontal blank character and shorten your search regex ^ |\G(?!^) as (^|\G)\h. Indeed, this new regex would allow to change lines which contain, both, leading tabs and leading spaces ! So, we obtain the S/R :

          Find what (^|\G)\h

          Replace with \t

          Notes :

          • This S/R, matches the first blank character, at the beginning of a line and change it into a tabulation

          • Then, it matches the next blank character and change it, again, in a tabulation

          • As soon as the next blank character is NOT closed to the previous one, ( because of some non-blank characters ), the replace process is stopped, due to the \G syntax

          • Then, on matching the first blank character, of the second line, the S/R process is re-started, and so on…

          Finally, this S/R only change any leading blank character into a tabulation character


          It took me a long time to find out an equivalent regex, which does NOT use the \G syntax. But, as you can see, this regex, below, is not so elegant as our previous regex !

          Find what : (\h)|[^\h\r\n].*

          Replace with : (?1\t:$0)

          Notes :

          • As long as a blank character (\h) is matched , it is replaced with a tabulation character ( group 1 exists )

          • As soon as a non-blank character is detected, the regex matches all the rest of the current line [^\h\r\n].* and rewrites the entire match $0 ( group 1 does NOT exist )

          • Again, next line, the process re-starts and tries to match possible blank characters, first , and so on !


          An improvement could be to record a macro for that specific S/R.

          You may, either, add the following lines in your shortcuts.xml configuration file, with an OTHER editor than N++ and, then, re-open Notepad++. I chose the Alt + Ctrl + Shift + T shortcut, for (T)ab, but you may change it, as you like !

          <Macro name="S2T" Ctrl="yes" Alt="yes" Shift="yes" Key="84">
              <Action type="0" message="2316" wParam="0" lParam="0" sParam="" />
              <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
              <Action type="3" message="1601" wParam="0" lParam="0" sParam="(^|\G)\h" />
              <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
              <Action type="3" message="1602" wParam="0" lParam="0" sParam="\t" />
              <Action type="3" message="1702" wParam="0" lParam="512" sParam="" />
              <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
          </Macro>
          

          Here is a short explanation :

          • Message “2316” move the cursor at the very beginning of the current file
          • Message “1700” initialize the search/replacement process
          • Message “1601” contains the search pattern
          • Message “1625” select the search mode
          • Message “1602” contains the replacement pattern
          • Message “1702” represents the sum of the search options
          • Message “1701” represents the search/replacement command code to execute

          Refers to the documentation, below, for a detailed information :

          http://docs.notepad-plus-plus.org/index.php/Editing_Configuration_Files#Search_.2F_Replace_encoding

          Note : In documentation, the meaning of flag value 512, when the message = “1702”, is wrong ! The correct meaning is Search goes downwards !

          Best Regards,

          guy038

          P.S. :

          Here is, below, a mRNA sequence, translated into proteins, by a ribosome of a living cell ( found somewhere, in a regex documentation, to explain the interest of the \G feature ! ) :

          ....AUGGGUCGACUGGUUCUCGAAGGUUUCAAAGGUUCAAGGGUCCGGUAUUCAGUCGUCCGCUCUACUGGUACAAAGGGGGUACCACGACUGGUUCUCGAAUAG
          

          If we take off the start codon AUG and its leading nearby sequences, as well as the stop codon UAG, we, now, obtain the sequence :

          GGUCGACUGGUUCUCGAAGGUUUCAAAGGUUCAAGGGUCCGGUAUUCAGUCGUCCGCUCUACUGGUACAAAGGGGGUACCACGACUGGUUCUCGAA
          ¯¯¯     ¨¨¨       ¯¯¯      ¯¯¯     ¨¨¨  ¨¨¨                    ¯¯¯        ¨¨¨         ¨¨¨
          1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
          

          In that RNA sequence, let’s consider the amino-acid sequence GGU. You remark, easily, that :

          • At positions 1, 19, 28 and 64, this sequence is a TRUE codon

          • At positions 9, 36, 41, 75 and 87, this sequence is NOT a codon

          Now :

          • Place the cursor just before the RNA sequence GGUCGA… ( IMPORTANT)

          • To understand the effect of the \G feature, try, successively, the different regexes, below

            .*?GGU which matches the SHORTEST range of characters, till an amino-acid sequence GGU

            (\u\u\u)*?GGU which matches the SHORTEST range of TRIPLETS, till an amino-acid sequence GGU

            \G(\u\u\u)*?GGU which matches the SHORTEST range of CODONS, till a GGU codon

          Notes :

          • The first regex matches any range of capital letters, whatever its size, followed with a string GGU

          • The second regex matches any range, containing 3*n capital letters, followed with a string GGU ( n >= 0 )

          • The third regex matches any range, containing 3*n capital letters, followed with a string GGU, separated, from the beginning, by 3*m capital letters, so the GGU sequence is a codon ( n and m >= 0 )

          Similarly :

          .*GGU          which matches the LONGEST range of characters, till an amino-acid sequence GGU
          
          (\u\u\u)*GGU   which matches the LONGEST range of TRIPLETS,   till an amino-acid sequence GGU
          
          \G(\u\u\u)*GGU which matches the LONGEST range of CODONS,     till a GGU codon
          


          For further information, on that topic, refer to the links, below :

          https://en.wikipedia.org/wiki/Codon

          https://en.wikipedia.org/wiki/Start_codon

          https://en.wikipedia.org/wiki/Stop_codon

          1 Reply Last reply Reply Quote 1
          • T
            The Fartman
            last edited by Jun 7, 2016, 6:04 PM

            Hi, guy038,

            I’m afraid there’s no solution, unless the developer, that implemented the TAB to Space and the Space to TAB… features, would give a glance to his code ! So, in the meanwhile, the only way seems to create a specific S/R, as you did :-)

            Yeah, but the question is: does this thread work as bug-report? I assume this is an official Notepad++ forum, and developers keep an eye on it, so behavior of “Spaces to TABS” menu item will be fixed in next release… or not?

            However, I think that we would better to search for any horizontal blank character and shorten your search regex ^ |\G(?!^) as (^|\G)\h. Indeed, this new regex would allow to change lines which contain, both, leading tabs and leading spaces ! So, we obtain the S/R :
            Find what (^|\G)\h
            Replace with \t

            Thanx, that’s much better regexp, shorten and clear. Macro is nice too.

            1 Reply Last reply Reply Quote 0
            • G
              guy038
              last edited by guy038 Mar 6, 2023, 7:05 AM Mar 5, 2023, 1:06 PM

              Hello, @the-fartman and All,

              I realize that I made some syntax errors in the last part of my last and very old post ( After the P.S. indication )

              So, here is the updated version :

              P.S. :

              Here is, below, a mRNA sequence, translated into proteins, by a ribosome of a living cell ( found somewhere, in a regex documentation, to explain the interest of the \G regex feature ! ) :

              ....AUGGGUCGACUGGUUCUCGAAGGUUUCAAAGGUUCAAGGGUCCGGUAUUCAGUCGUCCGCUCUACUGGUACAAAGGGGGUACCACGACUGGUUCUCGAAUAG
              

              If we take off the start codon AUG and its leading nearby sequences, as well as the stop codon UAG, we, now, obtain the sequence :

              GGUCGACUGGUUCUCGAAGGUUUCAAAGGUUCAAGGGUCCGGUAUUCAGUCGUCCGCUCUACUGGUACAAAGGGGGUACCACGACUGGUUCUCGAA
              ¯¯¯     ¨¨¨       ¯¯¯      ¯¯¯     ¨¨¨  ¨¨¨                    ¯¯¯        ¨¨¨         ¨¨¨
              1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
              

              In that RNA sequence, let’s consider the amino-acid sequence GGU. You remark, easily, that :

              • At positions 1, 19, 28 and 64, this sequence is a TRUE codon

              • At positions 9, 36, 41, 75 and 87, this sequence is NOT a codon

              Now :

              • Place the cursor right before the RNA sequence GGUCGA… ( IMPORTANT)

              • To understand the effect of the \G feature, try, successively, the 3 regexes, below :

                  .*?GGU           which matches the SHORTEST range of characters, till an amino-acid sequence GGU
                  
                  (\u\u\u)*?GGU    which matches the SHORTEST range of TRIPLETS,   till an amino-acid sequence GGU
                  
                  \G(\u\u\u)*?GGU  which matches the SHORTEST range of CODONS,     till a GGU codon
              

              Notes :

              • The first regex matches any range of capital letters, whatever its size, followed with the first string GGU

              • The second regex matches any range, containing 3*n capital letters, followed with the first string GGU ( n >= 0 )

              • The third regex matches any range, containing 3*n capital letters, followed with the first string GGU, separated, from the beginning, by 3*m capital letters, so the GGU sequence is a codon ( n and m >= 0 )

              Similarly, try these 3 regexes below :

                  .*GGU            which matches the LONGEST range of characters, till an amino-acid sequence GGU
                  
                  (\u\u\u)*GGU     which matches the LONGEST range of TRIPLETS,   till an amino-acid sequence GGU
                  
                  \G(\u\u\u)*GGU   which matches the LONGEST range of CODONS,     till a GGU codon
              

              Notes :

              • The first regex matches any range of capital letters, whatever its size, followed with the last string GGU

              • The second regex matches any range, containing 3*n capital letters, followed with a last string GGU ( n >= 0 )

              • The third regex matches any range, containing 3*n capital letters, followed with the last string GGU, separated, from the beginning, by 3*m capital letters, so the GGU sequence is a codon ( n and m >= 0 )

              For further information, on that topic, refer to the links, below :

              https://en.wikipedia.org/wiki/Codon

              https://en.wikipedia.org/wiki/Start_codon

              https://en.wikipedia.org/wiki/Stop_codon

              1 Reply Last reply Reply Quote 1
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors