Misalignment of text when Notepad++ edited files are opened in other editors (notepad, wordpad, google drive anyfile editor, etc.)

Claudia Frank

Looks strange, indeed. I would open the file in a hex editor and check for unusual bytes.

You mentioned that you share this file, is it possible to download it already?

Cheers
Claudia

Compu chan

Sure, here’s one of the 8086 assembly source files:

https://www.dropbox.com/s/gaxnn56hteuz5n7/g_lib.asm?dl=0

And here’s a screenshot of bad text alignment in the file. The comment block is shifted beyond 78 columns.
It just looks bad, and makes it a little awkward to read in a DOS editor.

https://www.dropbox.com/s/b1b120zj5w28la2/bad_formatting2.png?dl=0

Claudia Frank

@Compu-chan

Looks like your mixing tabs and spaces.
Activate the Show all characters

Convert it by using Edit->Blank Operations->TabToSpace !??

Cheers
Claudia

Compu chan

Problem solved. Thanks!

Solution:

Settings->preferences->Language
check “Replace by space” box

Edit->Blank Operations->TAB to space

Make some arbitrary change to the file (type a character then delete it), then re-save the file.

Formatting/indentation should be fixed.

Scott Sumner

@Compu-chan said:

So in your original post you provided misinformation:

Tabs have been replaced by spaces

Compu chan

No, I checked the replace tab with spaces box before posting my question.
Someone else on a different forum mentioned it as a possible solution to a similar but different problem.

It only prevents future tabulation symbols from appearing in your file.
It doesn’t get rid of the tabulations that are already present.

You do that with:
Edit->Blank Operations->TAB to space

Scott Sumner

@Compu-chan

Perhaps the verbage “Replace by space” in the Preferences is poor–it makes it sound like by checking the box it will perform an action rather than simply changing a setting!

Compu chan

Also, i’m not sure if there’s already a way to do this, but being able to perfom operations on all open files at the same time would be nice.

I had to do the TAB to spaces operation on about a dozen files one by one to fix them all.
Not a big deal, but slightly inconvenient.

Not sure if i’m supposed to mark this post as solved.
Don’t really see the option to do so.

Scott Sumner

@Compu-chan

There is no option to do that specific operation on a group of files, however, it could be done with a regular expression Replace in Files on a bunch of files at once. Replace \t with how many ever spaces you want a tab character to be.

If you had changed the mode of this thread to “Ask as a Question” then you could later go in and “Mark as Solved”. You can still do it I think…

guy038

Hi @compu-chan, @scott-sumner and All,

Scott said :

There is no option to do that specific operation on a group of files, however, it could be done with a regular expression Replace in Files on a bunch of files at once. Replace \t with how many ever spaces you want a tab character to be.

Unfortunately, Scott-, it’s not that simple !! Just because the physical length of the tabulation character depends on its position :-(( So I began to investigate a way to simulate all the native Blank Operations of Notepad++, by a regex search replacement.

The main advantage, of this method, is that you can perform these Blank operations, on multiple files, belonging to a same folder, in the Find in files dialog :-))

The first 3 operations are quite easy to realize :

Trim Trailing Space :
- SEARCH [\x20\t]+$ and REPLACE let the zone EMPTY
Trim Leading Space :
- SEARCH ^[\x20\t]+ and REPLACE let the zone EMPTY
Trim Leading or Trailing Space :
- SEARCH ^[\x20\t]+|[\x20\t]+$ and REPLACE Let the zone EMPTY

The following 2 operations are not too difficult to achieve, too !

EOL to Space :
- SEARCH (?<=\x20)\R|(\R) and REPLACE (?1\x20) ( Change of any line-break by a space or suppression, if preceded by a space character )
Remove Unnecessary Blank and EOL :
- SEARCH ^[\x20\t]+|[\x20\t]+$ and REPLACE Let the zone EMPTY ( Suppression of leading and trailing Blank characters )
- SEARCH \R and REPLACE \x20 ( Replacement of any line break by a space character )

Now, due to the tabulation’s behaviour, which always stops at column 4*n whatever n > 0, the last 3 Blank Operations are much hardier to elaborate, on the “regex” point of view !!

TAB to Space :
- SEARCH (?-s)(?:()|(.)|(..)|(...))\t|(....) and REPLACE (?1 )(?2\2 )(?3\3 )(?4\4 )(?5$0)

Briefly, here are, below, the different cases :

Let C = Any unique STANDARD character

	<   =   ()     + \t   ->   \1 + 4 Spaces   Group 1

1	<   =   (C)    + \t   ->   \2 + 3 Spaces   Group 2
 	<   =   (C)    + \t   ->   \2 + 3 Spaces   Group 2

12	<   =   (CC)   + \t   ->   \3 + 2 Spaces   Group 3
  	<   =   (CC)   + \t   ->   \3 + 2 Spaces   Group 3

123	<   =   (CCC)  + \t   ->   \4 + 1 Space    Group 4
   	<   =   (CCC)  + \t   ->   \4 + 1 Space    Group 4

1234<   =   (CCCC)        ->   $0              Group 5

Notes :

Depending on the number of characters, preceding the tabulation character, this S/R rewrites these characters, followed by the appropriate number of spaces
Note that if the range of 4 chars does not contain any tabulation, it is simply rewritten ( $0 ). This replacement, seemingly useless, is, however, necessary to go on looking for the next blocks of 4 positions long !

Space to TAB (All) :
- SEARCH (?-s)(?|([^ \t\r\n])\x20(?:\x20[\x20\t]|\t)|([^ \t\r\n]{2})\x20[\x20\t]|([^ \t\r\n]{3})\x20)|(\x20{0,3}\t|\x20{4})|([^ \t\r\n]{1,3}\t|....)
- REPLACE (?1\1\t)(?2\t)(?3$0)

Again, below, here is the recapitulation of all cases, with their appropriate replacements :

Let C = [^ \t\r\n] = Any unique STANDARD character, different of a SPACE and a TABULATION

a 	<   =   (C)    + 1 sp + \t  ->   \1\t   Group 1
a  	<   =   (C)    + 2 sp + \t  ->   \1\t   Group 1
a   <   =   (C)    + 3 sp       ->   \1\t   Group 1

ab 	<   =   (CC)   + 1 sp + \t  ->   \1\t   Group 1
ab  <   =   (CC)   + 2 sp       ->   \1\t   Group 1

abc <   =   (CCC)  + 1 sp       ->   \1\t   Group 1

	<   =   (0 sp + \t)         ->   \t     Group 2
 	<   =   (1 sp + \t)         ->   \t     Group 2
  	<   =   (2 sp + \t)         ->   \t     Group 2
   	<   =   (3 sp + \t)         ->   \t     Group 2
    <   =   (4 sp)              ->   \t     Group 2

a	<   =   (C    + \t)         ->   $0     Group 3
ab	<   =   (CC   + \t)         ->   $0     Group 3
abc	<   =   (CCC  + \t)         ->   $0     Group 3
abcd<   =   (CCCC)              ->   $0     Group 3

Notes :

Quickly, all the cases are divide up into 3 main parts, with the appropriate replacements :
- Standard characters followed by a mix of spaces/tabulations must be rewritten, followed by a tabulation ( (?1\1\t) )
- Mix of spaces/tabulation, only, must be replaced by a single tabulation ( (?2\t) )
- Standard characters, followed by an unique tabulation, have to be simply rewritten ( (?3$0) )
Note, in the search regex, a special construct (?!......) , which resets the sub-expression count, at the start of each | alternative of this construct. So, whatever the branch matched, in our example, the matched expression is always stored in group 1 and will be replaced, according to the conditional form (?1\1\t)

Space to TAB (Leading) :
- SEARCH ^(?:\x20|\t)+ and REPLACE $0#
- SEARCH (?:\x20{4}|\x20{0,3}\t)(?=.*#)#? and REPLACE \t

Here are all the cases, of blank characters combination, which may occur, at beginning of lines :

	<   =   0 sp + \t   ->   \t
 	<   =   1 sp + \t   ->   \t
  	<   =   2 sp + \t   ->   \t
   	<   =   3 sp + \t   ->   \t
    <   =   4 sp        ->   \t

Notes :

In the first S/R, we look, at beginning of lines, any range of space or tabulation character(s) and simply rewrite this range, $0 , followed with a # symbol. Note that you may use any symbol, absent from your file, which will be used as a mark
In the second S/R, all the combination of leading blank characters, which are followed, further on, by the # character, are changed, along with the mark character, with an unique tabulation character !

Best Regards,

guy038

PS :

Let consider the special branch reset construct, below :

(a)(?|x(y)z|(p(q)r)|(t)u(v))(z)

We have :

Group 1 = a
Group 2 = y or p(q)r or t
Group 3 = None or q or v
Group 4 = z

With a classical list of alternatives, as below :

(a)(?:x(y)z|(p(q)r)|(t)u(v))(z)

We would get, instead :

Group 1 = a
Group 2 = None or y
Group 3 = None or p(q)r
Group 4 = None or q
Group 5 = None or t
Group 6 = None or v
Group 7 = z

Scott Sumner

@guy038 said:

the physical length of the tabulation character depends on its position

When I said “Replace \t with how many ever spaces you want a tab character to be” I was considering leading tab characters only, which is really the only consistent way to use tab characters (IMO). I tend to stay away from tab characters in general, but when I work on projects where they are used, it is always in the “tab-indent, space-align” style. This means that tab characters are the only whitespace that are allowed before the first non-whitespace character on a line, and spaces are the only valid whitespace after the first non-whitespace character on a line. With this usage, my original Replace operation is valid–there should never be a situation in which moving to the next tab-stop is not the full # of spaces (if the conversion were to be done).

due to the tabulation’s behaviour, which always stops at column 4*n whatever n > 0

Can you explain this, I’m not understanding what this means??

guy038

Hi, @scott-sumner,

Indeed, using leading tabulation characters and space characters, everywhere else, in lines, should be the sensible attitude, while coding :-D And, therefore, your solution, about leading tabs is quite exact !

As for my assertion :

due to the tabulation’s behaviour, which always stops at column 4*n whatever n > 0

I pointed out the fact that a tabulation character always ends at column 4, 8, 12, 16,…, that is to say, on a 4*n position !

Moreover, if c is the column, > 0, where begins the tabulation character, its physical length l, between 1 and 4, can be found with the formula l = 4 - ((c-1) % 4)

BTW, I just realize that all this works, only, if the tab size value, in Settings > Preferences… > Language > Tab Settings, is 4. In the general case, if tab size = s, the physical length l of a tabulation, beginning at column c, would be : l = s - ((c-1) % s), with % standing for the mathematical operation modulo !

Cheers,

guy038

P.S : I forgot to give an example :

Let’s suppose the tab size is 7 and a tabulation character begins at column c = 157. This implies that its length l = 7 - ((157-1) % 7) = 5 and it ends at column 157 + 5 - 1 = 161, which is, effectively, a multiple of 7 ( 161 = 7 * 23 )

Scott Sumner

@guy038

Thanks…what confused me was that it sounded like you were saying that tabstops were always 4 columns…your most-recent post clears that up.

Misalignment of text when Notepad++ edited files are opened in other editors (notepad, wordpad, google drive anyfile editor, etc.)

Trim Trailing Space :

Trim Leading Space :

Trim Leading or Trailing Space :

EOL to Space :

Remove Unnecessary Blank and EOL :

TAB to Space :

Space to TAB (All) :

Space to TAB (Leading) :