UDL documentation (()) operator

PeterJones

I cannot find it documented at Ivan’s site ( though he does mention the (( )) syntax in delimiters example 2 and comments example 4), and I know it’s not in the official UDL overview, so I went looking through the code. LexUser.cpp#L222-L225 and a couple other instances in that file reference the \v and \b.

It looks like \v means “any whitespace, including newlines”, whereas \b means just space and tab. I might have to start exploring where those can be used in the dialog box (and whether they require the (()) notation or not.

dshuman52

You confirmed the slash type and I tested It in a UDL and it appears to work as noted inside the ((…)) operator as I remember. Thanks

Thanks also for the code reference where I can see if there are other options for what appears to be a potentially valuable operator.

dshuman52

LexUser lines 634 through 642 part of static “inline void SubGroup(const char * s, vvstring & vec, bool group=false)” seem to address the detection of the ((…)) construct.

Around LexUser 1679 I should assure I ignore escape characters when there is no end of delimiter.

Around line 1697 I should interpret the lack of a delimiter to be an indication this set of “delimeters” (one of 1-8) is actually a forward keyword list instead of a delimeter list.

My first impression is that between 1678 and 1679 is the place to code the forward keyword list option where one set of coding can achieve both requirements.

My first approximation of the pseudo code for this change would be (obviously this is untested)

If there are no delimclose entries (
set state delimstate based on characters matching only delimopen
…
/* we are done processing this text (forward keyword)
there are no additional nested processing requirements for the located keyword
and need to determine what is next based on a subsequent character
*/
sc.SetState(SCE_USER_STYLE_IDENTIFIER);

} else {
LexUser lines 1679- 1723
}

Thanks for the assistance. I will report the results when I get them.

Alan Kilborn

@dshuman52

You are talking about Scintilla code, and that is fine, except when you talk about changing it you are talking in the wrong place. Scintilla is a separate project from Notepad++.

PeterJones

@Alan-Kilborn said in UDL documentation (()) operator:

You are talking about Scintilla code

Ummm… Not so sure. LexUser.cpp is in the scintilla folder, because it’s a lexer… but if you grab the scintilla source code, it doesn’t include LexUser.cpp, and the copyright notice for LexUser.cpp says,

this file is part of notepad++
Copyright (C)2003 Don HO ...

whereas the other lexers in that folder retain their scintilla copyright notices. I think the UDL lexer is Notepad++ controlled, despite being compiled in that folder.

(edit: removed email address from copyright to avoid spambot harvesting)

Alan Kilborn

@PeterJones said in UDL documentation (()) operator:

LexUser.cpp is in the scintilla folder

Ah, okay, maybe just a “bad” location for the file. :-)

PeterJones

@dshuman52 said in UDL documentation (()) operator:

I tested It in a UDL and it appears to work as noted inside the ((…)) operator as I remember.

If you’re still around: my experiments with ((end\bif)) didn’t seem to match what you were describing. Was it delimiters, comments, or folding where you got it to work? Could you do a screenshot of the UDL window where you have that example working (feel free to black out anything that you think is proprietary).

dshuman52

I have the same question about where this works. ((EOL)) does not work as a folder 2 close where I tried it. To prove a concept I also tried using /r/n /r /n in the list and not surprisingly that did not work. The tinyxml… code indicates that and could work but they did not (I included the following ). Furthermore once they are entered into the file, they do not appear when reopening the UDL configuration window even though they still appear in the file.

I am pretty sure a \b^\v as the middle of the folder type 2 is NOT actually doing what I expect which is to be a continuation character. So while the syntax is understood, using that appears not to yield the desired results.

I successfully completed the forward keyword hack of the delimiter logic and have submitted it for inclusion. At the same time I also fixed some colorizing issues when items in my case numbers and keywords appeared as the last items in a file not followed by white space and have submitted that code for approval too.

if I remember what I read was “end\bif” would match end if, end or if (three things). However ((endif end\bif)) should only match 2 things. Meaning I believe that ((end end\bif) is expected to match two things what many like want “end\bif” to do. The quotes “…” instead of the ((…)) as documented was supposed to change the behavior of the contents.

No other than understanding the syntax I am not sure I have had success with \b or \v outside delimiters or inside double quotes or inside double parens, yet.

dshuman52

There are escaped xml hex values for \r \n in one of the sentences above that do not display in the text as in hex a and hex d.

dshuman52

Alan I made the same mistake you did – I submitted my changes to scintilla and was told that was notepad code. I had the additional embarassment of contacting the wrong person.

dshuman52

@dshuman52 I have been digging through the code. It appears only the some lists understand the ((…)) operator.

the following lists may support the ((…)) operator:

commentLineOpen, commentLineContinue, commentLineClose, commentOpen, commentClose, delim1Open, delim1Escape, delim1Close, delim2Open, delim2Escape, delim2Close, delim3Open, delim3Escape, delim3Close, delim4Open, delim4Escape, delim4Close, delim5Open, delim5Escape, delim5Close, delim6Open, delim6Escape, delim6Close, delim7Open, delim7Escape, delim7Close, delim8Open, delim8Escape, delim8Close, FoldersInCode1Open, FoldersInCode1Middle, FoldersInCode1Close, Operators1

whether ((end end\bif)) works in the above lists is unknown to me

The numbers and the following list of lists do not appear understand ((…))

All numbers list, OPERATOR2, FOLDER_IN_CODE2(all), FOLDER_IN_COMMENT(all), KEYWORD1, KEYWORD2, KEYWORD3, KEYWORD4, KEYWORD5, KEYWORD6, KEYWORD7, KEYWORD8

All lists appear to have some support for \b and \v however it only between two items like “end\bif” or “end\vif” not apparently as a prefix or suffix. This is from a quick viewing of the code. I would not guarantee any of these observations.

dshuman52

This post is deleted!

dshuman52

This post is deleted!

dshuman52

@dshuman52 peter if you are still there. From what I have read getting a \b of \v through xml 1.0 is not possible and even with xml 1.1 which may or may not be supported means support for these characters is on a case by case basis. I believe there are permissible control characters higher in the ascii range that could be mapped to \b and \v – would that be an acceptable sounding if it were to be coded. Using 2 chars a \ and a b I have improved on the existing code. Does anyone have suggestions on what characters to map to what characters?

dshuman52

This post is deleted!

dshuman52

@dshuman52

The following code performs substitutions on a space inside single or double quotes of \b or \v respectively. That is how you activate the existing code “end if” means end\vif or ‘end case’ means end\bcase.

only apparently in the following buffers.

// OPERATORS2, FOLDERS_IN_CODE2, FOLDERS_IN_COMMENT, KEYWORDS1-8


void ScintillaEditView::setUserLexer(const TCHAR *userLangName)
{
	int setKeywordsCounter = 0;
    execute(SCI_SETLEXER, SCLEX_USER);

	UserLangContainer * userLangContainer = userLangName? NppParameters::getInstance().getULCFromName(userLangName):_userDefineDlg._pCurrentUserLang;

	if (!userLangContainer)
		return;

	UINT codepage = CP_ACP;
	UniMode unicodeMode = _currentBuffer->getUnicodeMode();
	int encoding = _currentBuffer->getEncoding();
	if (encoding == -1)
	{
		if (unicodeMode == uniUTF8 || unicodeMode == uniCookie)
			codepage = CP_UTF8;
	}
	else
	{
		codepage = CP_OEMCP;	// system OEM code page might not match user selection for character set,
								// but this is the best match WideCharToMultiByte offers
	}

	execute(SCI_SETPROPERTY, reinterpret_cast<WPARAM>("fold"), reinterpret_cast<LPARAM>("1"));
	execute(SCI_SETPROPERTY, reinterpret_cast<WPARAM>("userDefine.isCaseIgnored"),		  reinterpret_cast<LPARAM>(userLangContainer->_isCaseIgnored ? "1":"0"));
	execute(SCI_SETPROPERTY, reinterpret_cast<WPARAM>("userDefine.allowFoldOfComments"),  reinterpret_cast<LPARAM>(userLangContainer->_allowFoldOfComments ? "1":"0"));
	execute(SCI_SETPROPERTY, reinterpret_cast<WPARAM>("userDefine.foldCompact"),		  reinterpret_cast<LPARAM>(userLangContainer->_foldCompact ? "1":"0"));

    char name[] = "userDefine.prefixKeywords0";
	for (int i=0 ; i<SCE_USER_TOTAL_KEYWORD_GROUPS ; ++i)
	{
		itoa(i+1, (name+25), 10);
		execute(SCI_SETPROPERTY, reinterpret_cast<WPARAM>(name), reinterpret_cast<LPARAM>(userLangContainer->_isPrefix[i] ? "1" : "0"));
	}

	for (int i = 0 ; i < SCE_USER_KWLIST_TOTAL ; ++i)
	{
		WcharMbcsConvertor& wmc = WcharMbcsConvertor::getInstance();
		const char * keyWords_char = wmc.wchar2char(userLangContainer->_keywordLists[i], codepage);

		if (globalMappper().setLexerMapper.find(i) != globalMappper().setLexerMapper.end())
		{
			execute(SCI_SETPROPERTY, reinterpret_cast<WPARAM>(globalMappper().setLexerMapper[i].c_str()), reinterpret_cast<LPARAM>(keyWords_char));
		}
		else // OPERATORS2, FOLDERS_IN_CODE2, FOLDERS_IN_COMMENT, KEYWORDS1-8
		{
			char temp[max_char];
			bool inDoubleQuote = false;
			bool inSingleQuote = false;
			bool nonWSFound = false;
			int index = 0;
			for (size_t j=0, len = strlen(keyWords_char); j<len && index < (max_char-1); ++j)
			{
				if (!inSingleQuote && keyWords_char[j] == '"')
				{
					inDoubleQuote = !inDoubleQuote;
					continue;
				}

				if (!inDoubleQuote && keyWords_char[j] == '\'')
				{
					inSingleQuote = !inSingleQuote;
					continue;
				}

				if (keyWords_char[j] == '\\' && (keyWords_char[j+1] == '"' || keyWords_char[j+1] == '\'' || keyWords_char[j+1] == '\\'))
				{
					++j;
					temp[index++] = keyWords_char[j];
					continue;
				}

				if (inDoubleQuote || inSingleQuote)
				{
					if (keyWords_char[j] > ' ')		// copy non-whitespace unconditionally
					{
						temp[index++] = keyWords_char[j];
						if (nonWSFound == false)
							nonWSFound = true;
					}
					else if (nonWSFound == true && keyWords_char[j-1] != '"' && keyWords_char[j+1] != '"' && keyWords_char[j+1] > ' ')
					{
						temp[index++] = inDoubleQuote ? '\v' : '\b';
					}
					else
						continue;
				}
				else
				{
					temp[index++] = keyWords_char[j];
				}

			}
			temp[index++] = 0;
			execute(SCI_SETKEYWORDS, setKeywordsCounter++, reinterpret_cast<LPARAM>(temp));
		}
	}

 	char intBuffer[32];

	sprintf(intBuffer, "%d", userLangContainer->_forcePureLC);
	execute(SCI_SETPROPERTY, reinterpret_cast<WPARAM>("userDefine.forcePureLC"), reinterpret_cast<LPARAM>(intBuffer));

	sprintf(intBuffer, "%d", userLangContainer->_decimalSeparator);
	execute(SCI_SETPROPERTY, reinterpret_cast<WPARAM>("userDefine.decimalSeparator"), reinterpret_cast<LPARAM>(intBuffer));

	// at the end (position SCE_USER_KWLIST_TOTAL) send id values
	sprintf(intBuffer, "%" PRIuPTR, reinterpret_cast<uintptr_t>(userLangContainer->getName())); // use numeric value of TCHAR pointer
	execute(SCI_SETPROPERTY, reinterpret_cast<WPARAM>("userDefine.udlName"), reinterpret_cast<LPARAM>(intBuffer));

	sprintf(intBuffer, "%" PRIuPTR, reinterpret_cast<uintptr_t>(_currentBufferID)); // use numeric value of BufferID pointer
    execute(SCI_SETPROPERTY, reinterpret_cast<WPARAM>("userDefine.currentBufferID"), reinterpret_cast<LPARAM>(intBuffer));

	for (int i = 0 ; i < SCE_USER_STYLE_TOTAL_STYLES ; ++i)
	{
		Style & style = userLangContainer->_styleArray.getStyler(i);

		if (style._styleID == STYLE_NOT_USED)
			continue;

		char nestingBuffer[32];
		sprintf(nestingBuffer, "userDefine.nesting.%02d", i );
		sprintf(intBuffer, "%d", style._nesting);
		execute(SCI_SETPROPERTY, reinterpret_cast<WPARAM>(nestingBuffer), reinterpret_cast<LPARAM>(intBuffer));

		setStyle(style);
	}
}