Misalignment of text when Notepad++ edited files are opened in other editors (notepad, wordpad, google drive anyfile editor, etc.)



  • I’m using a monospace font (Dejavu Sans Mono).
    Global Bold Font style is enabled.
    Tabs have been replaced by spaces.

    I like the way I have editor set up now and i’d rather not compromise the look of the characters.
    However, I do not want to exceed a text line length of 78 characters. (For compatibility with DOS), and
    with this misalignment issue some lines are shifted waaaay too far to the right.

    Is there a box or something I need to uncheck, or what? How do I fix this?



  • @Compu-chan

    not sure if I understand correctly but I assume by pressing return you ensure that the line breaks, correct?
    If this is the case, then, do the other editors also use the same font settings?
    If this isn’t the case, then how did you break the line on char 78?

    Cheers
    Claudia



  • I broke the line using the included visible vertical edge tool set to column 78.

    Here’s some screenshots:

    https://www.dropbox.com/s/qdkadolahm66b71/bad_formatting.png?dl=0
    https://www.dropbox.com/s/r2pyeoaf8w5quk9/bad_formatting1.png?dl=0



  • Just to clear up any confusion, my problem is that within Notepad++ all lines of text line up neatly and nicely.
    When these files are then opened in other editors, some of the lines of text are shifted over to the right by varying amounts.
    Some lines of text are shifted so far that they pass my desired max column limit of 78 characters.
    I’m trying to figure out how to avoid this problem.

    I share this assembly code on the internet, and I don’t want others to open the files in their text editor of choice and have all of the lines all jagged and uneven.

    I want it to look nice across editors. I know this is impossible with some editors, but surely there’s at least a way to mitigate this problem in Wordpad/Notepad at least.



  • @Compu-chan

    Looks strange, indeed. I would open the file in a hex editor and check for unusual bytes.

    You mentioned that you share this file, is it possible to download it already?

    Cheers
    Claudia



  • Sure, here’s one of the 8086 assembly source files:

    https://www.dropbox.com/s/gaxnn56hteuz5n7/g_lib.asm?dl=0

    And here’s a screenshot of bad text alignment in the file. The comment block is shifted beyond 78 columns.
    It just looks bad, and makes it a little awkward to read in a DOS editor.

    https://www.dropbox.com/s/b1b120zj5w28la2/bad_formatting2.png?dl=0



  • @Compu-chan

    Looks like your mixing tabs and spaces.
    Activate the Show all characters

    Convert it by using Edit->Blank Operations->TabToSpace !??

    Cheers
    Claudia



  • Problem solved. Thanks!

    Solution:

    Settings->preferences->Language
    check “Replace by space” box

    Edit->Blank Operations->TAB to space

    Make some arbitrary change to the file (type a character then delete it), then re-save the file.

    Formatting/indentation should be fixed.



  • @Compu-chan said:

    So in your original post you provided misinformation:

    Tabs have been replaced by spaces



  • No, I checked the replace tab with spaces box before posting my question.
    Someone else on a different forum mentioned it as a possible solution to a similar but different problem.

    It only prevents future tabulation symbols from appearing in your file.
    It doesn’t get rid of the tabulations that are already present.

    You do that with:
    Edit->Blank Operations->TAB to space



  • @Compu-chan

    Perhaps the verbage “Replace by space” in the Preferences is poor–it makes it sound like by checking the box it will perform an action rather than simply changing a setting!



  • Also, i’m not sure if there’s already a way to do this, but being able to perfom operations on all open files at the same time would be nice.

    I had to do the TAB to spaces operation on about a dozen files one by one to fix them all.
    Not a big deal, but slightly inconvenient.

    Not sure if i’m supposed to mark this post as solved.
    Don’t really see the option to do so.



  • @Compu-chan

    There is no option to do that specific operation on a group of files, however, it could be done with a regular expression Replace in Files on a bunch of files at once. Replace \t with how many ever spaces you want a tab character to be.

    If you had changed the mode of this thread to “Ask as a Question” then you could later go in and “Mark as Solved”. You can still do it I think…



  • Hi @compu-chan, @scott-sumner and All,

    Scott said :

    There is no option to do that specific operation on a group of files, however, it could be done with a regular expression Replace in Files on a bunch of files at once. Replace \t with how many ever spaces you want a tab character to be.

    Unfortunately, Scott-, it’s not that simple !! Just because the physical length of the tabulation character depends on its position :-(( So I began to investigate a way to simulate all the native Blank Operations of Notepad++, by a regex search replacement.

    The main advantage, of this method, is that you can perform these Blank operations, on multiple files, belonging to a same folder, in the Find in files dialog :-))


    The first 3 operations are quite easy to realize :

    • Trim Trailing Space :

      • SEARCH [\x20\t]+$ and REPLACE let the zone EMPTY
    • Trim Leading Space :

      • SEARCH ^[\x20\t]+ and REPLACE let the zone EMPTY
    • Trim Leading or Trailing Space :

      • SEARCH ^[\x20\t]+|[\x20\t]+$ and REPLACE Let the zone EMPTY

    The following 2 operations are not too difficult to achieve, too !

    • EOL to Space :

      • SEARCH (?<=\x20)\R|(\R) and REPLACE (?1\x20) ( Change of any line-break by a space or suppression, if preceded by a space character )
    • Remove Unnecessary Blank and EOL :

      • SEARCH ^[\x20\t]+|[\x20\t]+$ and REPLACE Let the zone EMPTY ( Suppression of leading and trailing Blank characters )

      • SEARCH \R and REPLACE \x20 ( Replacement of any line break by a space character )


    Now, due to the tabulation’s behaviour, which always stops at column 4*n whatever n > 0, the last 3 Blank Operations are much hardier to elaborate, on the “regex” point of view !!

    • TAB to Space :

      • SEARCH (?-s)(?:()|(.)|(..)|(...))\t|(....) and REPLACE (?1 )(?2\2 )(?3\3 )(?4\4 )(?5$0)

    Briefly, here are, below, the different cases :

    Let C = Any unique STANDARD character
    
    	<   =   ()     + \t   ->   \1 + 4 Spaces   Group 1
    
    1	<   =   (C)    + \t   ->   \2 + 3 Spaces   Group 2
     	<   =   (C)    + \t   ->   \2 + 3 Spaces   Group 2
    
    12	<   =   (CC)   + \t   ->   \3 + 2 Spaces   Group 3
      	<   =   (CC)   + \t   ->   \3 + 2 Spaces   Group 3
    
    123	<   =   (CCC)  + \t   ->   \4 + 1 Space    Group 4
       	<   =   (CCC)  + \t   ->   \4 + 1 Space    Group 4
    
    1234<   =   (CCCC)        ->   $0              Group 5
    

    Notes :

    • Depending on the number of characters, preceding the tabulation character, this S/R rewrites these characters, followed by the appropriate number of spaces

    • Note that if the range of 4 chars does not contain any tabulation, it is simply rewritten ( $0 ). This replacement, seemingly useless, is, however, necessary to go on looking for the next blocks of 4 positions long !


    • Space to TAB (All) :

      • SEARCH (?-s)(?|([^ \t\r\n])\x20(?:\x20[\x20\t]|\t)|([^ \t\r\n]{2})\x20[\x20\t]|([^ \t\r\n]{3})\x20)|(\x20{0,3}\t|\x20{4})|([^ \t\r\n]{1,3}\t|....)

      • REPLACE (?1\1\t)(?2\t)(?3$0)

    Again, below, here is the recapitulation of all cases, with their appropriate replacements :

    Let C = [^ \t\r\n] = Any unique STANDARD character, different of a SPACE and a TABULATION
    
    a 	<   =   (C)    + 1 sp + \t  ->   \1\t   Group 1
    a  	<   =   (C)    + 2 sp + \t  ->   \1\t   Group 1
    a   <   =   (C)    + 3 sp       ->   \1\t   Group 1
    
    ab 	<   =   (CC)   + 1 sp + \t  ->   \1\t   Group 1
    ab  <   =   (CC)   + 2 sp       ->   \1\t   Group 1
    
    abc <   =   (CCC)  + 1 sp       ->   \1\t   Group 1
    
    	<   =   (0 sp + \t)         ->   \t     Group 2
     	<   =   (1 sp + \t)         ->   \t     Group 2
      	<   =   (2 sp + \t)         ->   \t     Group 2
       	<   =   (3 sp + \t)         ->   \t     Group 2
        <   =   (4 sp)              ->   \t     Group 2
    
    a	<   =   (C    + \t)         ->   $0     Group 3
    ab	<   =   (CC   + \t)         ->   $0     Group 3
    abc	<   =   (CCC  + \t)         ->   $0     Group 3
    abcd<   =   (CCCC)              ->   $0     Group 3
    

    Notes :

    • Quickly, all the cases are divide up into 3 main parts, with the appropriate replacements :

      • Standard characters followed by a mix of spaces/tabulations must be rewritten, followed by a tabulation ( (?1\1\t) )

      • Mix of spaces/tabulation, only, must be replaced by a single tabulation ( (?2\t) )

      • Standard characters, followed by an unique tabulation, have to be simply rewritten ( (?3$0) )

    • Note, in the search regex, a special construct (?!......), which resets the sub-expression count, at the start of each | alternative of this construct. So, whatever the branch matched, in our example, the matched expression is always stored in group 1 and will be replaced, according to the conditional form (?1\1\t)


    • Space to TAB (Leading) :

      • SEARCH ^(?:\x20|\t)+ and REPLACE $0#

      • SEARCH (?:\x20{4}|\x20{0,3}\t)(?=.*#)#? and REPLACE \t

    Here are all the cases, of blank characters combination, which may occur, at beginning of lines :

    	<   =   0 sp + \t   ->   \t
     	<   =   1 sp + \t   ->   \t
      	<   =   2 sp + \t   ->   \t
       	<   =   3 sp + \t   ->   \t
        <   =   4 sp        ->   \t
    

    Notes :

    • In the first S/R, we look, at beginning of lines, any range of space or tabulation character(s) and simply rewrite this range, $0 , followed with a # symbol. Note that you may use any symbol, absent from your file, which will be used as a mark

    • In the second S/R, all the combination of leading blank characters, which are followed, further on, by the # character, are changed, along with the mark character, with an unique tabulation character !

    Best Regards,

    guy038

    PS :

    Let consider the special branch reset construct, below :

    (a)(?|x(y)z|(p(q)r)|(t)u(v))(z)

    We have :

    • Group 1 = a

    • Group 2 = y or p(q)r or t

    • Group 3 = None or q or v

    • Group 4 = z

    With a classical list of alternatives, as below :

    (a)(?:x(y)z|(p(q)r)|(t)u(v))(z)

    We would get, instead :

    • Group 1 = a

    • Group 2 = None or y

    • Group 3 = None or p(q)r

    • Group 4 = None or q

    • Group 5 = None or t

    • Group 6 = None or v

    • Group 7 = z



  • @guy038 said:

    the physical length of the tabulation character depends on its position

    When I said “Replace \t with how many ever spaces you want a tab character to be” I was considering leading tab characters only, which is really the only consistent way to use tab characters (IMO). I tend to stay away from tab characters in general, but when I work on projects where they are used, it is always in the “tab-indent, space-align” style. This means that tab characters are the only whitespace that are allowed before the first non-whitespace character on a line, and spaces are the only valid whitespace after the first non-whitespace character on a line. With this usage, my original Replace operation is valid–there should never be a situation in which moving to the next tab-stop is not the full # of spaces (if the conversion were to be done).

    due to the tabulation’s behaviour, which always stops at column 4*n whatever n > 0

    Can you explain this, I’m not understanding what this means??



  • Hi, @scott-sumner,

    Indeed, using leading tabulation characters and space characters, everywhere else, in lines, should be the sensible attitude, while coding :-D And, therefore, your solution, about leading tabs is quite exact !

    As for my assertion :

    due to the tabulation’s behaviour, which always stops at column 4*n whatever n > 0

    I pointed out the fact that a tabulation character always ends at column 4, 8, 12, 16,…, that is to say, on a 4*n position !

    Moreover, if c is the column, > 0, where begins the tabulation character, its physical length l, between 1 and 4, can be found with the formula l = 4 - ((c-1) % 4)

    BTW, I just realize that all this works, only, if the tab size value, in Settings > Preferences… > Language > Tab Settings, is 4. In the general case, if tab size = s, the physical length l of a tabulation, beginning at column c, would be : l = s - ((c-1) % s), with % standing for the mathematical operation modulo !

    Cheers,

    guy038

    P.S : I forgot to give an example :

    Let’s suppose the tab size is 7 and a tabulation character begins at column c = 157. This implies that its length l = 7 - ((157-1) % 7) = 5 and it ends at column 157 + 5 - 1 = 161, which is, effectively, a multiple of 7 ( 161 = 7 * 23 )



  • @guy038

    Thanks…what confused me was that it sounded like you were saying that tabstops were always 4 columns…your most-recent post clears that up.


Log in to reply