Tabs to spaces

The Fartman

Hi.
For a long time, I used single spaces as indent in my files.
Now I decided to use tabs instead.
But I’ve got a problem: when I use “Space to TAB” menu item, Notepad++ only replaces spaces starting from second space in each line. First space in every line is left untouched, even though size of tab is set to “1”.

Screenshots:
http://s018.radikal.ru/i514/1605/1b/d28b37060914.png
http://s019.radikal.ru/i621/1605/b4/152d0a97d0d7.png
http://s020.radikal.ru/i704/1605/2a/3f6645c8b9dc.png

This is not what I want. Why Notepad is acting this way? What should I do to replace each and every space character in the beginning of every line to tab?

P.S. Sorry for my English, I’m not a native speaker…

guy038

Hello The Fartman,

Ah, yes ! You’re right about it ! If you set in Settings - Preferences - Tab Settings, for a given language, the default width of a tabulation character to any number T, > 1, the substitution of spaces to tabs, with the options Edit- Blank operations - Space to TAB …, is almost correct.

I said “almost”, as I noticed that, when an unique space is located at position, k*T, whatever integer k, it could have been changed into a tabulation character of one-character width !

But, if you set the tabulation size T to the value 1 and that you select one of the two options Space to TAB …, the arrangement of text is kept by adding any range of tabulations, of one-character long, systematically followed by a last space character ! It looks like a (small) bug :-((

If you mind about it, just perform the following simple S/R, in extended or regex mode, to change that space into a tabulation character :

Find what : \x20
Replace with : \t

By the way, could you explain to me about the real interest in changing any space by a tabulation of one-character long ? For memory, they, both, belong to the Unicode horizontal blank characters range, which you may search for, with the \h syntax and which matches any of the 3 characters, below :

\t or \x09 ( Tabulation or HT )
\x20 ( Space or SP )
\xa0 ( No-Break Space or NBSP )

Best Regards,

guy038

The Fartman

But, if you set the tabulation size T to the value 1 and that you select one of the two options Space to TAB …, the arrangement of text is kept by adding any range of tabulations, of one-character long, systematically followed by a last space character ! It looks like a (small) bug :-((

Yeah, and that really bugs me out. Looks like that feature which was supposed to allow users to migrate from spaces indentation to tabs indentation doesn’t work :(
If it’s actually a bug, I’m looking forward that it will be fixed in next release.

By the way, could you explain to me about the real interest in changing any space by a tabulation of one-character long ?

Not sure about your question, but I guess you’re referring to my screenshot where I picked “Space to TAB (All)” instead of “Space to TAB (Leading)”, but in fact, I did it only because “… Leading” menu item didn’t work too, so I just demonstrate that even “… All” doesn’t perform expected actions as well.

Anyway, I just want to change indentation of my code (not replace every space with tab, indeed): for years I was using single space, and now I want to use tab, and I want Notepad++ to do it for me. Manually re-indent thousands of lines would a little exhausting.

For now, I use this regular expression: find "^ |\G(?!^) ", replace with “\t” to change indentation, but having that menu item fixed would be better.

P.S.: This thread should be called “Spaces to tabs” indeed, I mistyped.

guy038

Hi, The Fartman,

Sorry for my late reply, but we had a hard week , at work !

I’m afraid there’s no solution, unless the developer, that implemented the TAB to Space and the Space to TAB… features, would give a glance to his code ! So, in the meanwhile, the only way seems to create a specific S/R, as you did :-)

By the way, to restrict the replacement to the leading spaces, you have found a very clever regex !! Indeed, the syntax \G is rarely used, but can be very interesting in some cases ! In post-scriptum, at the end of that post, I describe a use of the \G syntax, to detect a range of true codons, in an RNA sequence :-))

For the record, the \G form is an assertion, which represents any of 3 locations, below :

The very beginning of a file
The location of cursor after the previous match of the regex engine
The location of cursor, previously moved, on purpose, by the user

However, I think that we would better to search for any horizontal blank character and shorten your search regex ^ |\G(?!^) as (^|\G)\h. Indeed, this new regex would allow to change lines which contain, both, leading tabs and leading spaces ! So, we obtain the S/R :

Find what (^|\G)\h

Replace with \t

Notes :

This S/R, matches the first blank character, at the beginning of a line and change it into a tabulation
Then, it matches the next blank character and change it, again, in a tabulation
As soon as the next blank character is NOT closed to the previous one, ( because of some non-blank characters ), the replace process is stopped, due to the \G syntax
Then, on matching the first blank character, of the second line, the S/R process is re-started, and so on…

Finally, this S/R only change any leading blank character into a tabulation character

It took me a long time to find out an equivalent regex, which does NOT use the \G syntax. But, as you can see, this regex, below, is not so elegant as our previous regex !

Find what : (\h)|[^\h\r\n].*

Replace with : (?1\t:$0)

Notes :

As long as a blank character (\h) is matched , it is replaced with a tabulation character ( group 1 exists )
As soon as a non-blank character is detected, the regex matches all the rest of the current line [^\h\r\n].* and rewrites the entire match $0 ( group 1 does NOT exist )
Again, next line, the process re-starts and tries to match possible blank characters, first , and so on !

An improvement could be to record a macro for that specific S/R.

You may, either, add the following lines in your shortcuts.xml configuration file, with an OTHER editor than N++ and, then, re-open Notepad++. I chose the Alt + Ctrl + Shift + T shortcut, for (T)ab, but you may change it, as you like !

<Macro name="S2T" Ctrl="yes" Alt="yes" Shift="yes" Key="84">
    <Action type="0" message="2316" wParam="0" lParam="0" sParam="" />
    <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
    <Action type="3" message="1601" wParam="0" lParam="0" sParam="(^|\G)\h" />
    <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />
    <Action type="3" message="1602" wParam="0" lParam="0" sParam="\t" />
    <Action type="3" message="1702" wParam="0" lParam="512" sParam="" />
    <Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
</Macro>

Here is a short explanation :

Message “2316” move the cursor at the very beginning of the current file
Message “1700” initialize the search/replacement process
Message “1601” contains the search pattern
Message “1625” select the search mode
Message “1602” contains the replacement pattern
Message “1702” represents the sum of the search options
Message “1701” represents the search/replacement command code to execute

Refers to the documentation, below, for a detailed information :

http://docs.notepad-plus-plus.org/index.php/Editing_Configuration_Files#Search_.2F_Replace_encoding

Note : In documentation, the meaning of flag value 512, when the message = “1702”, is wrong ! The correct meaning is Search goes downwards !

Best Regards,

guy038

P.S. :

Here is, below, a mRNA sequence, translated into proteins, by a ribosome of a living cell ( found somewhere, in a regex documentation, to explain the interest of the \G feature ! ) :

....AUGGGUCGACUGGUUCUCGAAGGUUUCAAAGGUUCAAGGGUCCGGUAUUCAGUCGUCCGCUCUACUGGUACAAAGGGGGUACCACGACUGGUUCUCGAAUAG

If we take off the start codon AUG and its leading nearby sequences, as well as the stop codon UAG, we, now, obtain the sequence :

GGUCGACUGGUUCUCGAAGGUUUCAAAGGUUCAAGGGUCCGGUAUUCAGUCGUCCGCUCUACUGGUACAAAGGGGGUACCACGACUGGUUCUCGAA
¯¯¯     ¨¨¨       ¯¯¯      ¯¯¯     ¨¨¨  ¨¨¨                    ¯¯¯        ¨¨¨         ¨¨¨
1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890

In that RNA sequence, let’s consider the amino-acid sequence GGU. You remark, easily, that :

At positions 1, 19, 28 and 64, this sequence is a TRUE codon
At positions 9, 36, 41, 75 and 87, this sequence is NOT a codon

Now :

Place the cursor just before the RNA sequence GGUCGA… ( IMPORTANT)
To understand the effect of the \G feature, try, successively, the different regexes, below

.*?GGU which matches the SHORTEST range of characters, till an amino-acid sequence GGU

(\u\u\u)*?GGU which matches the SHORTEST range of TRIPLETS, till an amino-acid sequence GGU

\G(\u\u\u)*?GGU which matches the SHORTEST range of CODONS, till a GGU codon

Notes :

The first regex matches any range of capital letters, whatever its size, followed with a string GGU
The second regex matches any range, containing 3*n capital letters, followed with a string GGU ( n >= 0 )
The third regex matches any range, containing 3*n capital letters, followed with a string GGU, separated, from the beginning, by 3*m capital letters, so the GGU sequence is a codon ( n and m >= 0 )

Similarly :

.*GGU          which matches the LONGEST range of characters, till an amino-acid sequence GGU

(\u\u\u)*GGU   which matches the LONGEST range of TRIPLETS,   till an amino-acid sequence GGU

\G(\u\u\u)*GGU which matches the LONGEST range of CODONS,     till a GGU codon

For further information, on that topic, refer to the links, below :

https://en.wikipedia.org/wiki/Codon

https://en.wikipedia.org/wiki/Start_codon

https://en.wikipedia.org/wiki/Stop_codon

The Fartman

Hi, guy038,

I’m afraid there’s no solution, unless the developer, that implemented the TAB to Space and the Space to TAB… features, would give a glance to his code ! So, in the meanwhile, the only way seems to create a specific S/R, as you did :-)

Yeah, but the question is: does this thread work as bug-report? I assume this is an official Notepad++ forum, and developers keep an eye on it, so behavior of “Spaces to TABS” menu item will be fixed in next release… or not?

However, I think that we would better to search for any horizontal blank character and shorten your search regex ^ |\G(?!^) as (^|\G)\h. Indeed, this new regex would allow to change lines which contain, both, leading tabs and leading spaces ! So, we obtain the S/R :
Find what (^|\G)\h
Replace with \t

Thanx, that’s much better regexp, shorten and clear. Macro is nice too.

guy038

Hello, @the-fartman and All,

I realize that I made some syntax errors in the last part of my last and very old post ( After the P.S. indication )

So, here is the updated version :

P.S. :

Here is, below, a mRNA sequence, translated into proteins, by a ribosome of a living cell ( found somewhere, in a regex documentation, to explain the interest of the \G regex feature ! ) :

....AUGGGUCGACUGGUUCUCGAAGGUUUCAAAGGUUCAAGGGUCCGGUAUUCAGUCGUCCGCUCUACUGGUACAAAGGGGGUACCACGACUGGUUCUCGAAUAG

If we take off the start codon AUG and its leading nearby sequences, as well as the stop codon UAG, we, now, obtain the sequence :

GGUCGACUGGUUCUCGAAGGUUUCAAAGGUUCAAGGGUCCGGUAUUCAGUCGUCCGCUCUACUGGUACAAAGGGGGUACCACGACUGGUUCUCGAA
¯¯¯     ¨¨¨       ¯¯¯      ¯¯¯     ¨¨¨  ¨¨¨                    ¯¯¯        ¨¨¨         ¨¨¨
1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890

In that RNA sequence, let’s consider the amino-acid sequence GGU. You remark, easily, that :

At positions 1, 19, 28 and 64, this sequence is a TRUE codon
At positions 9, 36, 41, 75 and 87, this sequence is NOT a codon

Now :

Place the cursor right before the RNA sequence GGUCGA… ( IMPORTANT)
To understand the effect of the \G feature, try, successively, the 3 regexes, below :

    .*?GGU           which matches the SHORTEST range of characters, till an amino-acid sequence GGU
    
    (\u\u\u)*?GGU    which matches the SHORTEST range of TRIPLETS,   till an amino-acid sequence GGU
    
    \G(\u\u\u)*?GGU  which matches the SHORTEST range of CODONS,     till a GGU codon

Notes :

The first regex matches any range of capital letters, whatever its size, followed with the first string GGU
The second regex matches any range, containing 3*n capital letters, followed with the first string GGU ( n >= 0 )
The third regex matches any range, containing 3*n capital letters, followed with the first string GGU, separated, from the beginning, by 3*m capital letters, so the GGU sequence is a codon ( n and m >= 0 )

Similarly, try these 3 regexes below :

    .*GGU            which matches the LONGEST range of characters, till an amino-acid sequence GGU
    
    (\u\u\u)*GGU     which matches the LONGEST range of TRIPLETS,   till an amino-acid sequence GGU
    
    \G(\u\u\u)*GGU   which matches the LONGEST range of CODONS,     till a GGU codon

Notes :

The first regex matches any range of capital letters, whatever its size, followed with the last string GGU
The second regex matches any range, containing 3*n capital letters, followed with a last string GGU ( n >= 0 )
The third regex matches any range, containing 3*n capital letters, followed with the last string GGU, separated, from the beginning, by 3*m capital letters, so the GGU sequence is a codon ( n and m >= 0 )

For further information, on that topic, refer to the links, below :

https://en.wikipedia.org/wiki/Codon

https://en.wikipedia.org/wiki/Start_codon

https://en.wikipedia.org/wiki/Stop_codon