Manual: chapter Search

Paul Wormer

I was trying to find in the manual the definition of “Token”, such as the term appears in “Search > Style All Occurrences of Token”. Unfortunately, I haven’t been able to find it, although I did find “the word at the caret”. I suspect that this is the meaning of “Token”? Does it mean that there is no way to style arbitrary strings?

Further the following sentence in the manual may possibly pertain to an older version of N++ than 8.4.7:
“You activate smart highlighting through Settings > Preferences > MISC > Smart highlighting, Enable smart highlighting. By default, the highlighting is case insensitive, which may be a problem sometimes. Then just toggle Settings > Preferences > MISC > Highlighting is case sensitive on”.
I did find some settings like this under Settings > Preferences > Highlighting, but not under MISC. Am I right, or do I overlook something?

Alan Kilborn

@Paul-Wormer said in Manual: chapter Search:

I did find “the word at the caret”.
I suspect that this is the meaning of “Token”?

Your post relates to why I recently requested the user manual stop using “token” as much as it did – because IMO most people don’t know what this term means. This request was granted.

Unfortunately, since the word is still used in the software itself, it shouldn’t have been entirely removed from the manual, and definitely defined somewhere therein. Ideally, “token” would be purged from the software itself, and then we no longer have to think about it.

BTW, I believe “token” in context, alludes back to old string processing in C, where the strtok() function (“string token”, or really, “tokenize string”) was well-used, see REFERENCE.

Being at heart a “coder’s editor”, Notepad++ menus were originally designed with the thought that “everyone knows what a token is”.

Typically, a token refers to a bit of text, probably without spaces, e.g. “one two three” has 3 tokens in it.

The menu item Style All Occurrences of Token hints that when used without selected text, it will grab the “token” (really, “word”) at the caret, and use that. Obviously, with selected text, the “token” part of the command is misleading.

Paul Wormer

@Alan-Kilborn said in Manual: chapter Search:
(/post/81845):

@Paul-Wormer said in [Manual: chapter Search] > I suspect that this is the meaning of “Token”?

Typically, a token refers to a bit of text, probably without spaces, e.g. “one two three” has 3 tokens in it.

A token is a string with word boundaries? A substring cannot be styled with this command?

Alan Kilborn

@Paul-Wormer said in Manual: chapter Search:

A token is a string with word boundaries?

No. Hopefully my explanation was not that unclear. :-(
I believe you see why a string such as “one two three” has 3 tokens in it?

A substring cannot be styled with this command?

Yes, it can, but you have to select the substring before running the command (with no selection it grabs the entire string/token at caret), and you have to make sure Match whole word only is not checked in the Highlighting > Style All Occurrences of Token section of the Preferences:

Alan Kilborn

@Paul-Wormer said:

(Paraphrase) In the manual I saw:

You activate smart highlighting through Settings > Preferences > MISC > Smart highlighting

That text appears HERE in the current manual.

I did find some settings like this under Settings > Preferences > Highlighting, but not under MISC. Am I right

You are indeed correct, the references to MISC are outdated. I created an issue to fix that for the manual, HERE.

Paul Wormer

@Alan-Kilborn said in Manual: chapter Search:

@Paul-Wormer said in Manual: chapter Search:

A token is a string with word boundaries?

No. Hopefully my explanation was not that unclear. :-(
I believe you see why a string such as “one two three” has 3 tokens in it?

I see 3 words 😢. But anyway, I’m glad that substrings can be colored, not just whole words. Thank you for pointing out what I had to unselect in this jungle of preference boxes. I presume that it takes many years to know your way around them?

Alan Kilborn

@Paul-Wormer said in Manual: chapter Search:

I see 3 words

Token is often a synonym for “word”.
But token can also be other things, in this application it can also mean “selected text” (when the Style… command is invoked).
My quoted string example of 3 words maybe went back unnecessarily to processing (“tokenizing”) C strings; sorry.

I presume that it takes many years to know your way around

I suppose so, but this is true of many things in life.
Ideally, over time, the software gets better (more intuitive), so the know-your-way-around time is shortened.

Lycan Thrope

@Paul-Wormer ,
Just for clarity and precision, as regards programming, a token is an atomic indicator/word that has other value attached, particularly as it is regards, lexical analysis.
See if this works. :)

Token

A lexical token or simply token is a string with an assigned and thus identified meaning. It is structured as a pair consisting of a token name and an optional token value. The token name is a category of lexical unit.[2] Common token names are

    identifier: names the programmer chooses;
    keyword: names already in the programming language;
    separator (also known as punctuators): punctuation characters and paired-delimiters;
    operator: symbols that operate on arguments and produce results;
    literal: numeric, logical, textual, reference literals;
    comment: line, block (Depends on the compiler if compiler implements comments as tokens otherwise it will be stripped).

Examples of token values Token name 	Sample token values
identifier 	x, color, UP
keyword 	if, while, return
separator 	}, (, ;
operator 	+, <, =
literal 	true, 6.02e23, "music"
comment 	/* Retrieves user data */, // must be negative

Consider this expression in the C programming language:

    x = a + b * 2;

The lexical analysis of this expression yields the following sequence of tokens:

    [(identifier, x), (operator, =), (identifier, a), (operator, +), (identifier, b), (operator, *), (literal, 2), (separator, ;)]

A token name is what might be termed a part of speech in linguistics.

— text excerpt compliments of Wikipedia.
Essentially, it represents something more than what is visible. For instance, the word, may have other data, colorization, bold, italic characteristics to display, but all you see is the word. It is a token, that has additional meaning assigned to it, by the program. A text editor which allows attributes to be applied to various parts of the text is essentially a lexical analyzer that applies other data to the tokens (words). :)

Alan Kilborn

@Lycan-Thrope

it represents something more than what is visible. For instance, the word, may have … colorization…

With that definition, the command Style All Occurrences of Token is prescient, because it only becomes a “token” after the command is executed. :-)

Lycan Thrope

@Alan-Kilborn ,
Actually, the wording of that command only verifies that it is already a token. It is simply adding a Style data element to it’s data such as it’s position in a document. :)