How can I go to a specific column?

astewart77

@mkupper said in How can I go to a specific column?:

It is surprising that the Goto dialog box does not support column numbers.

The GotoLineColumn plugin does that and more. Go to line,column or show current line,column - chars or unicode byte counts and whats at the position.

I understood the developer doesn’t tend to add trivial features well solved by an available plugin.

Alan Kilborn

I was going to suggest that one could move to a desired column by first moving the caret to column 1 on the desired line and then doing a regular-expression search for (?-s).{N} where N is one less than the desired column.

So, for example if you want column 500 then you’d search for (?-s).{499}. Of course, when you get a hit from that, you need to verify that you have remained on the line of interest and not found some later line.

BUT…another reason I cannot really recommend doing that is, while experimenting with it, I found some circumstances with unicode characters where it doesn’t work correctly, i.e., the caret doesn’t end up in the correct column. :-(

Alan Kilborn

@astewart77 said in How can I go to a specific column?:

The GotoLineColumn plugin does that and more.

I can also recommend this plugin, although I’ve never used it to actually move to a line/column, :-)

I use it to see the bytes that make up multi-byte UTF-8 characters.

But for those that aren’t familiar with this plugin, there are some nice screenshots to take a look at on its HOME PAGE.

The plugin notwithstanding, there is some reasonable argument that if the native product (Notepad++) has a “goto line” feature, then it should also have a “goto column” feature (especially since it already has “goto offset” as a tangential feature).

On the other hand… I’d think that if the native product did have a “goto column” feature, it would be rather simple-minded.

Perhaps something scripted would be a nice addition, where one could do such reasonable (but less simple) things as:

move to absolute line e.g. 7654 or line+column, 1234,56
move to specified column, staying on current line, e.g. *,89 or maybe simpler ,89
move the caret ahead/back a fixed number of lines at a time, e.g. +27 would move 27 lines downward from the caret’s current line, -56 would move the caret upward by 56 lines
move the caret ahead/back a fixed number of columns at a time…
etc etc etc

mkupper

@Alan-Kilborn said in How can I go to a specific column?:

I was going to suggest that one could move to a desired column by first moving the caret to column 1 on the desired line and then doing a regular-expression search for (?-s).{N} where N is one less than the desired column.

Search/replace only knows about 16-bit words, which may or may not be “characters.” For example, ~🤦🏻‍♀️~ looks like a single character bracketed by tilde marks. In Notepad++ you search for that using either ~\x{D83E}\x{DD26}\x{D83C}\x{DFFB}\x{200D}\x{2640}\x{FE0F}~ or ~.{7}~

FACE PALM (U+1F926) which Notepad++ stores as the surrogate pair \x{D83E}\x{DD26}
EMOJI MODIFIER FITZPATRICK TYPE-1-2 (U+1F3FB) which Notepad++ stores as the surrogate pair \x{D83C}\x{DFFB}
ZERO WIDTH JOINER (U+200D)
FEMALE SIGN (U+2640)
VARIATION SELECTOR-16 (U+FE0F)

A 🤦🏻‍♀️ is one character, 5 Unicode code-points, 7 UTF-16 style words which is the Notepad++ “column” and also the things we deal with in search/replace, and 14 bytes which relevant to Notepad++'s “position” number. Notepad++ does not seem to offer a way to work with the data as “character” or Unicode code-point units.

If you have Notepad++'s View / Show Symbols / Show Non-Printing Characters option turned on then 🤦🏻‍♀️ is displayed as 🤦🏻 ZWJ ♀️. If you have Notepad++'s Preferences / MISC. / Use DrectWrite turned off then 🤦🏻‍♀️ displays as a black and white 🤦🏻 ♀️ or a black and white 🤦🏻 ZWJ ♀️ depending on your Show Non-Printing Characters setting.

Characters such as tabs are interesting in that they affect the column position but not the column number.

Thus, a “goto” function gets more complicated if you want it to work the way a random sample of humans think.

Alan Kilborn

@mkupper said in How can I go to a specific column?:

Search/replace only knows about 16-bit words

???

Search should (mostly) understand characters.
Thus . should match one character.
This is my general feeling :-) and not just one that makes something like .{499] put the caret in column 500.

This might be a herculean task for the regex engine with all the really specialized/wacky unicode encodings (note that I’m thinking of some of those brought up HERE), but the user doesn’t care how software arrives at the correct answer…

Characters such as tabs are interesting in that they affect the column position but not the column number.

Yes, tab characters deserve special consideration in a “goto column” implementation.

mkupper

@Alan-Kilborn said in How can I go to a specific column?:

@mkupper said in How can I go to a specific column?:
Search/replace only knows about 16-bit words
???

I had used the phrase “16-bit words” as its seemed to be the most accurate description of what search/replace seems to find or replace. For example, you can match ~🤦🏻‍♀️~ by searching for ~.......~ or ~.{7}~. Us humans see one character 🤦🏻‍♀️ between the tilde marks. For Notepad++ search function to match that you need to look for a series of seven 16-bit words. You can wild-card the match using dot or search for the more explicit ~\x{D83E}\x{DD26}\x{D83C}\x{DFFB}\x{200D}\x{2640}\x{FE0F}~ which are the seven 16-bit word values that comprise a single 🤦🏻‍♀️.

When you copy/paste ~🤦🏻‍♀️~ into Notepad++'s find box you are copy/pasting seven 16-bit words into the find/search field. Apparently Notepad++ does not use DirectWrite for that field. Notepad++ also does not check the Show Non-Printing Characters setting for the find/search fields. Thus for ~🤦🏻‍♀️~ we will always see see a black and white ~ 🤦 ♀️ ~ in the find or search field.

I decided to call them “16-bit words” as they are the data stream after UTF-16 encoding but do not have to be correct data. That’s why to search or replace for a U+1F926 we need to put the surrogate pair \x{D83E}\x{DD26} in the search or replace field. The first pass of parsing a search/replace field is to translate things such as \x{D83E} into the 16-bit word value \xD83E.

I use sed a lot and that tool converts regular expressions into 8-bit byte streams. To search or replace for a U+1F926 I need to know the encoding of the file I’m working with. In sed I may need to search/replace for the UTF-8 \xF0\x9F\xA4\xA6 or maybe the UTF-16 big-endian encoded \xd8\x3e\xdd\x26. Granted, my sed scripts are usually UTF-8 without a BOM meaning a U+1F926 is displayed as 🤦 in Notepad++ but I know that sed will be seeing four “characters” \xF0 \x9F \xA4 \xA6.

Notepad++ simplifies life compared to sed as you only hit “gotchas” where a dot is not the same as a character when dealing with Unicode outside of the BMP.

Related to this is that because dot is not a character to do the regexp 🤦🏻‍♀️+ I need to enter that in both Notepad++ and sed as (🤦🏻‍♀️)+. Notepad++ will see 🤦🏻‍♀️ as seven 16-bit words and so you wrap it in parentheses. sed will see 17 UTF encoded bytes \xf0\x9f\xa4\xa6\xf0\x9f\x8f\xbb\xe2\x80\x8d\xe2\x99\x80\xef\xb8\x8f which also needs to be inside parentheses so that the regexp + operator works.

Coises

@Alan-Kilborn said in How can I go to a specific column?:

Search should (mostly) understand characters.
Thus . should match one character.
This is my general feeling :-) and not just one that makes something like .{499] put the caret in column 500.

This might be a herculean task for the regex engine with all the really specialized/wacky unicode encodings (note that I’m thinking of some of those brought up HERE), but the user doesn’t care how software arrives at the correct answer…

Search of Unicode text in Notepad++ behaves as if it were searching UTF-16 code units (“wide characters” in Windows).

The Boost regex library used by Notepad++ can do true Unicode searches, but it requires including the ICU (International Components for Unicode) library.

Having first attempted to make the search in my Columns++ plugin fully Unicode aware, I found the task daunting enough that I decided to replicate the behavior of Notepad++ instead. The ICU library is an inscrutable monster, and I couldn’t figure how even to include it in a plugin — it wants to be “installed,” and I could not determine if there would be a way to incorporate the required pieces in a plugin package (or even what pieces would be required).

I think (it’s already hazy in memory) I concluded that it would possible, though tricky and error-prone, to devise a custom utf-8 string iterator which identifies Unicode characters and returns them to Boost regex, but I decided that for a plugin it would be better to replicate the host’s behavior than add a lot of complexity, maintenance and potential bugs.

Adding to all the other horrors is that even if you use UTF-32, so that your code units are Unicode code points, code points are not necessarily characters. I don’t know if the ICU-enabled Boost regex library processes true characters, or just Unicode code points.

Alan Kilborn

Some good discussions here.
I think we haven’t moved the needle forward any on searching for .{499} being a good way to move to column 500. :-)

PeterJones

@Alan-Kilborn said in How can I go to a specific column?:

Some good discussions here.
I think we haven’t moved the needle forward any on searching for .{499} being a good way to move to column 500. :-)

I was hoping the \X “. on steroids” metioned in single character matches was going to work, but while it works on ǭ̳̚ (matching one grapheme), it does not work on ~🤦🏻‍♀️~, which even \X counts as seven matches. So apparently the modifiers and ZWJ and variation selectors are not the same as “combining characters” as far as \X is concerned.

But \X will get you closer on combining-character accented text, even if it doesn’t help with modifier+variation emoji.

So I would suggest ^\X{499} as a slightly better way to move to column 500, if you only have to deal with combining characters but not modifiers/joiners/varations.

Alan Kilborn

I’m starting to think that the notion of a “column” number value is a ridiculous concept. :-)

Moving away from regular expressions, here’s a PythonScript for experimentation, that will prompt you for a column number to move the caret to on the current line.

# -*- coding: utf-8 -*-
from __future__ import print_function

# references:
#  https://community.notepad-plus-plus.org/topic/25579/how-can-i-go-to-a-specific-column
#  for newbie info on PythonScripts, see https://community.notepad-plus-plus.org/topic/23039/faq-desk-how-to-install-and-run-a-script-in-pythonscript

from Npp import *

current_line = editor.lineFromPosition(editor.getCurrentPos())
max_column = editor.getColumn(editor.getLineEndPosition(current_line))
while True:
    if max_column == 0:
        notepad.messageBox('Current line is empty (has no columns)', '')
        break
    user_input = notepad.prompt('Enter column (1 - {mc}) to move caret to (on current line):'.format(mc=max_column+1), '', '')
    if user_input is None: break
    try:
        desired_column = int(user_input) - 1
        if not (0 <= desired_column <= max_column): raise ValueError
    except ValueError:
        continue
    pos_of_desired_col = editor.findColumn(current_line, desired_column)
    gc = editor.getColumn(pos_of_desired_col)
    if desired_column > gc:
        notepad.messageBox('desired column ({dc}) is greater than existing column ({gc}) on current line; moving to column {gc}'.format(
            dc=desired_column+1, gc=gc+1), '')
    editor.setSel(pos_of_desired_col, pos_of_desired_col)
    break

Note that this does NOT implement any of the grand ideas I had EARLIER for a well-rounded-out “goto line/col” feature (maybe a script for that comes later if we can find a way beyond “weird” column results with some of the stranger unicode characters…).