Search/Find specific Language Syntax ?
-
Hi :)
I have scripts in python and want to find all numbers that are highlighted as numbers by the python language syntax highlighting of notepad++. Eg. I don’t want to find numbers in quotes and also no numbers that are out-commented.
Since the syntax highlighting of notepad is able to differentiate between all these different numbers, I hope there is somewhere a functionality to search/find for this too somehow?
-
Interesting. The sytax highlighter presumably uses multiple layers of logic, rather than a single regex, to determine whether a string of digits is within a string or not. Doing it in a single regex would be … complicated, in my opinion.
My first thought would be to go to the Generic Regular Expression (regex) Formulas, and then start with the Replacing in a specific zone of text formula. But that would be used to find only those numbers that are inside the quotes. There would be some way of negating that search… but with the complexities of that formula, it might be more trouble than it’s worth (though now that I’ve suggested it, I bet Guy will implement it for you).
My second thought, assuming you don’t have overly complicated Python, is that you could test for digits that only have an even number of single quotes or double quotes before them on the line. I think that might be the best bet.
On my test data of
a = 0 b = '1' c = '1' + str(2) d = '1' + str(2) + '3' e = '1' + str(2) + '3' + str(4)
… the following finds
0
,2
, and4
, but not the1
or3
FIND =
(?m-s)(?:^|\G)([^'\d\v]*?'[^']*?'[^'\d\v]*?|[^'\v])*?\K\d
MODE = Regular ExpressionWith a little adaptation, that could be expanded to look for both even numbers of
'
and even numbers of"
(either with expansion of the character ranges used and some clever backreferencing, or more simply by adding another alternate to the main expression that is identical to the first batch, but with"
instead of'
), but I will leave that as an exercise for the reader----
Useful References
-
@PeterJones Thank you very much for the reply :)
Guess that means that such a feature is not implemented in notepad++?
Where can I suggest to add this feature?And thanks for the try, but I think an implemented solution in notepad is better than any Expression try. I mean I also want to exclude numbers in comments (lines starting with #) and within docstrings (within “”" docstring “”" which can be multiline).
Adding search/find for syntax highlighting will also cover alot other usecases and can be customized in the same way the “programming language syntax highlighting” can be customized.
In case anyone cares why I’m looking for this:
I have many and huge python scripts containing calculations. And I want to switch from float type to the decimal module, for precise calculation. But for this I need to replace every single number in my code eg:" 10 -> Decimal(“10”) ". If I overlook a single one, the script will fail (at runtime). -
The FAQ section of the Forum explains where to make feature requests.
However, it should be noted that the Syntax Highlighting portion of the code (the “Lexer”) is a completely separate piece of code from any search/replace features in Notepad++ (and is in fact a part of a separate library that Notepad++ uses, but that the Notepad++ developer did not write).
So, if the developer took you up on your request, he would have to re-implement the code elsewhere, and provide an entirely new interface to it, which isn’t covered by either the existing lexer interface or the existing search/replace interface and logic. It would be a huge amount of work, and I have my doubts that he would see the benefit in implementing such a feature for all users.
Further, every programming language – Python, Perl, C++, Lua, … – has a different set of rules for “what’s inside a string”, “what’s not inside a string”, “what’s commented out”, and the like. And I highly doubt he would want to have to implement the separate logic for each of the 80+ languages available to Notepad++. (And then someone would complain that it doesn’t also work for their UDL.) So my guess is there’s very little chance of your request being implemented
I mean I also want to exclude numbers in comments
Excluding numbers in comments and docstrings is rather pointless except for readability, because if you accidentally also change a
10
inside a docstring toDecimal(10)
, it wouldn’t change how the program executes.If that really bothers you, you can just search for docstrings or comments, and then change them back only inside the docstrings or comments
If I overlook a single one, the script will fail (at runtime).
Then do verification by running the script under a variety of conditions, to make sure you didn’t miss even a single one, before releasing the script. (If you already have a good test suite, all you would have to do is run your test suite. If you don’t already have one, then write it before making your change.)
. . . -
If you are willing to do multi-step:
-
Temporarily differentiate between opening and closing
'''
or"""
or'
or"
- FIND:
(?s-i)((''')|(""")|(')|("))(.*?)\g{1}
- REPLACE:
<(?2docstrSingle)(?3docstrDouble)(?4quoteSingle)(?5quoteDouble)>$6</(?2docstrSingle)(?3docstrDouble)(?4quoteSingle)(?5quoteDouble)>
- Replace All
- FIND:
-
Put ☺ markers around numbers in docstrings or quotes
- This comes from replace in specific zone
- FIND =
(?-si:<docstrSingle>|(?!\A)\G)(?s-i:(?!</docstrSingle>).)*?\K(?-si:\d+)
- REPLACE:
☺$0☺
- Replace All
- FIND =
(?-si:<docstrDouble>|(?!\A)\G)(?s-i:(?!</docstrDouble>).)*?\K(?-si:\d+)
- REPLACE:
☺$0☺
- FIND =
(?-si:<quoteSingle>|(?!\A)\G)(?s-i:(?!</quoteSingle>).)*?\K(?-si:\d+)
- REPLACE:
☺$0☺
- Replace All
- FIND =
(?-si:<quoteDouble>|(?!\A)\G)(?s-i:(?!</quoteDouble>).)*?\K(?-si:\d+)
- REPLACE:
☺$0☺
- Replace All
-
Put ☺ markers around numbers in comments
- FIND:
(?-s)(?-i:#|(?!\A)\G).*?\K(?-i:\d+)
- REPLACE:
☺$0☺
- FIND:
-
Replace numbers that aren’t surrounded with ☺ with Decimal(…)
- FIND:
(?<![☺\d])\d+(?![☺\d])
- REPLACE:
Decimal\($0\)
- FIND:
-
Remove the ☺ markers
- FIND:
☺(\d+)☺
- REPLACE:
$1
- FIND:
-
Restore original dostring and quote indicators
- FIND:
(?s-i)</?((docstrSingle)|(docstrDouble)|(quoteSingle)|(quoteDouble))>
- REPLACE:
(?2''')(?3""")(?4')(?5")
- FIND:
a = 0 b = '1' c = '1' + str(2) d = '1' + str(2) + '3' e = '1' + str(2) + '3' + str(4) a = 0 b = "1" c = "1" + str(2) d = "1" + str(2) + "3" e = "1" + str(2) + "3" + str(4) a = 0 ''' 45 should match 67 should match ''' a = 910 # should be decimalized # comment 1234 number """ 123 other 456 """
becomes
a = Decimal(0) b = '1' c = '1' + str(Decimal(2)) d = '1' + str(Decimal(2)) + '3' e = '1' + str(Decimal(2)) + '3' + str(Decimal(4)) a = Decimal(0) b = "1" c = "1" + str(Decimal(2)) d = "1" + str(Decimal(2)) + "3" e = "1" + str(Decimal(2)) + "3" + str(Decimal(4)) a = Decimal(0) ''' 45 should match 67 should match ''' a = Decimal(910) # should be decimalized # comment 1234 number """ 123 other 456 """
-
-