Hi, looking for a regex that calculates distance between 2 numbers, then adjust the 2nd number to a minimum of 30.
-
Hi everyone,
I’m looking for a regex (or another tool solution that does math) that calculates the distance between 2 numbers, then adjusts only the 2nd number to a minimum of 30 from the first number. If it’s over 30, I’d like to leave it alone.
I have many text files that contain song lyric timing in a special format for karaoke. I’ve recently discovered that a batch of these files that our company obtained in a catalog acquisition have some of the words that cut off too soon and cause an error in video burning application, so I’d like to lengthen only those words. The word(s) in question always occur at the bottom of a ‘page’ on the karaoke screen, so there is a distinct ‘marker’ to isolate them for a regex. The regex needs to take into account that the problem can occur once in a file or dozens of times.
First example:
BODY ROCK/ 6094/6101/0 ----------------------------- PAGEV2
The line break, hyphened line and PAGEV2 always occur before the next page is displayed as a consistent marker. (The only time these 3 markers don’t occur are at the end of the file where there are only 2 line breaks). The text of lyrics at the beginning can be one word or many words. The portion I need coded for the regex are the first 2 sets of numbers with a forward slash between them. Notice the distance between 6094 and 6101 is 7. I would like it to be a minimum of 30, so the regex would need change the 2nd number to 6124. The regex also needs to take into account that the 2 numbers can be as low as 1 and as high as 5-6-7 digits. If the 2 numbers are 9995/9999/0, then the regex needs to change the second number to 9995/10025/0. The first number should always remain the same.
Here are some more examples:
ME/ 20534/20548/0 ----------------------------- PAGEV2
ME/ 21326/21347/0 ----------------------------- PAGEV2
NEED/ 5316/5320/0 ----------------------------- PAGEV2
Thank you for any help solving this problem for me.
Jeff
-
No regex engine will do this for you. Regular expressions always treat numbers as text. PythonScript is your friend.
from Npp import editor def on_match(m): n1s, n2s = m.groups() n1, n2 = int(n1s), int(n2s) n2 = max(n1 + 30, n2) return f'{n1}/{n2}' editor.rereplace(r'(?-i)(\d+)/(\d+)(?=/0\R\R-{29}\RPAGEV2)', on_match)
Regex explanation
(?-i)(\d+)/(\d+)(?=/0\R\R-{29}\RPAGEV2)
(?-i)
makes it so thatPAGEV2
is matched case-sensitively (NPP default is to ignore case)(\d+)
matches and “captures” any number of consecutive digits.(?=/0\R\R-{29}\RPAGEV2)
looks ahead to make sure that the two numbers separated by a slash are followed by another slash, then a zero, then two newlines, then a line with 29 dashes, then a line containing the textPAGEV2
.
I started with
BODY ROCK/ 6094/6101/0 ----------------------------- PAGEV2 BODY ROCK/ 167/198/0 ----------------------------- PAGEV2 BODY ROCK/ 975/1000/0 ----------------------------- PAGEV2 NFOERNEON/ 603/637/0 ----------------------------- PAGEV2
and got
BODY ROCK/ 6094/6124/0 ----------------------------- PAGEV2 BODY ROCK/ 167/198/0 ----------------------------- PAGEV2 BODY ROCK/ 975/1005/0 ----------------------------- PAGEV2 NFOERNEON/ 603/637/0 ----------------------------- PAGEV2
-
I’m developing a plugin — Columns++ — that might be able to help you. At present, it wouldn’t be a single step, though.
You’d need to apply the Calculate command from Columns++ to create a new column with the correct numbers, then apply a regular expression to put the new numbers into place.
I’m assuming that there’s no problem with looking at lines other than those at the end of the page — it’s just that they will never have the “less than 30” problem. If that assumption is incorrect, we would need to add another step to the description below; if it is true, it’s easier not to worry about selecting only the end of page lines and just “correct” them all, even though most will be unchanged.
Please back up your originals and work on copies you can replace if something goes wrong. If you have Notepad++ installed in the default location, you should be able to use the “Quick Installer” for the latest release of Columns++.
If word wrap is on, turn it off. Start with the cursor at the start of the file and nothing selected, choose Calculate… from the Columns++ plugin menu and click OK to enclose the entire document in a rectangular selection; then, in the Calculate dialog, use this formula:
reg(2)-reg(1)<30?reg(1)+30:reg(2)
and this regular expression:
(\d+)/(\d+)/\d+$
checking the settings Skip unmatched lines and Insert at left and unchecking Numeric aligned; other settings should be OK at defaults, but be sure Thousands separator is None, Suppress trailing zeros is checked (or Decimal places is set to 0), Time format is unchecked and Tabbed is checked.
That will add the correct second number, followed by a tab, at the left of every line that has the pattern of three numbers separated by diagonals at the end of the line. Once you verify that, you’ll use an ordinary Notepad++ Replace (not the one in Columns++) to restore the original format of those lines; specify Find what:
^(\d+)\h*\t(.*\d+/)(\d+)(/\d+)$
and Replace with:
\2\1\4
being sure that Regular expression is checked and . matches newline is unchecked, then use Replace All.
-
@Mark-Olson Hi Mark, I was able to get your regex code to properly select the number/number, but I’m unable to get the nppexec plugin to install in windows 11 arm. It doesn’t in the appear in the plugin list so I can select it to test the python portion after multiple installation attempts. Maybe pythonscript is not compatible with ARM.
I also tried the code on my Mac with Bbedit and terminal, but every time I run the python code, it fails as it cannot go past editor.rereplace as displays a name error. Would you happen to know what I can replace it with to work directly in terminal or Bbedit? I’m unable to find a substitute. thank you
-
@Jeff-Michaels
You’re using an ARM64 architecture, not x64 or x86? Yeah, I do think that PythonScript only works on x64 or x86, but I wouldn’t know. Maybe somebody else can say. In any case NppExec and BbEdit won’t help you here - theNpp
module exists only in PythonScript and it means absolutely nothing except when you’re running Notepad++ with PythonScript installed.In any case, if you can get PythonScript to work, you can download one of the .zip folders from here, unzip it, put it in the plugins folder of your Notepad++ installation, and then create a script in the
scripts
directory of the plugin folder containing the script I posted above.The ColumnsPlusPlus solution Coises described might make more sense if you have no intention of learning Python. PythonScript does have a non-negligible impact on the startup time of Notepad++, and while I love it because I’m familiar with Python, it’s harder to justify all this other overhead if you’re not.
-
@Mark-Olson said in Hi, looking for a regex that calculates distance between 2 numbers, then adjust the 2nd number to a minimum of 30.:
The ColumnsPlusPlus solution Coises described
I don’t have an ARM version of it, though, nor do I have any idea how I would test one, even if by some magical chance I could get it to build. :-( I didn’t even know there was an ARM version of Notepad++… or of “real” (non-RT) Windows, for that matter.
-
-
@Mark-Olson Hi again Mark. I had forgotten that I have an Intel iMac with Windows 10 (in Parallels) still installed lying around here. I was able to install Python and Notepad++. Then I went to the Plugin admin and added the PythonScript plugin directly from there. Then pasted your code into a new Notepad window and saved it as a script so it was accessible from the Plugins/Python Script/Scripts menu. Then opened a new window in Notepad and pasted in one the files I want to fix. After running the script, nothing happened. What I really want to do is place a bunch of files (they have a .kbp extension, but are text files) in a folder on the Windows desktop and batch process them all in one shot with your script. Would you be able to help me do this? Thank you so much again.
-
@Jeff-Michaels This is the error I’m getting at the moment
File “C:\Program Files\Notepad++\plugins\PythonScript\scripts\kbpfix.py”, line 5
return f’{n1}/{n2}’
^
SyntaxError: invalid syntax -
@Jeff-Michaels said in Hi, looking for a regex that calculates distance between 2 numbers, then adjust the 2nd number to a minimum of 30.:
This is the error I’m getting at the moment
Are you using a 2.x version of PythonScript?
If so, then it is probably caused by it not linking the f-string used in that line, as that is a Python3 feature.In Python2 you could write that line as:
return '{n1}/{n2}'.format(n1=n1, n2=n2)
-
(no success yet) I’m also trying to run the python code from Terminal on a Mac. Each time I run it, it comes up with a NameError. I placed the script in a folder on the desktop along with one of the text files I want to fix. As it cannot define the editor, the script will not run. Does anyone have any advice on how to get the script to run in Terminal with files in a folder on the desktop?
jeff@jeff ~ % python3 /Users/jeff/Desktop/test/kbpfix2.py
Traceback (most recent call last):
File “/Users/jeff/Desktop/test/kbpfix2.py”, line 14, in <module>
editor.rereplace(r’ (\d+)/(\d+)(?=/\d+[\r\n]±+$)', fix_distance)
NameError: name ‘editor’ is not defined -
UPDATE: I am so close, but I still cannot get this to work. I’m trying to stick with Terminal on macOS, but it produces an error at ‘editor.rereplace’. Is there an alternative to this part of the code to get it to work on a Mac? (I usually use grep in Bbedit as my application)
I was able to point to a folder on my desktop that will process all the files in it with .kbp extensions. The Python script fails as it doesn’t know what ‘editor.rereplace’ is.
Here is my new terminal command to process the files in a folder (python script is located inside this folder as well):
for f in *.kbp; do python3 kbpfix.py "$f" "${f%-fixed.kbp}" done
This is once again the original Python script that fails at the editor portion:
def on_match(m): n1s, n2s = m.groups() n1, n2 = int(n1s), int(n2s) n2 = max(n1 + 30, n2) return f'{n1}/{n2}' editor.rereplace(r'(?-i)(\d+)/(\d+)(?=/0\R\R-{29}\RPAGEV2)', on_match)
this is the error that appears in the macOS terminal after running:
editor.rereplace(r'(?-i)(\d+)/(\d+)(?=/0\R\R-{29}\RPAGEV2)', on_match) NameError: name 'editor' is not defined
Is there a suggestion as to what I can replace for the editor to get it to work on macOS in terminal?
-
I think you’re off-track.
This forum is about Notepad++ and its plugins.
We thought we were talking about PythonScript.
But apparently this has evolved into something that doesn’t relate to Notepad++, so we can’t continue to support you here.If you want to continue on with Python, suggest you find a Python forum to keep going with your questions.
-
The script I wrote above can’t be run except through PythonScript in Notepad++. It does not work like a normal Python script. If you want to start a chat with me, I can help you write a normal, non-PythonScript-based Python script to solve your problem in a way that’s unrelated to Notepad++.
-
@Mark-Olson Hi again Mark. I was just successful at getting the script to run in Notepad++ and the PythonScript plugin (yeah). I had a folder of 77 files and I needed to open each file separately and run the script on each and save. It corrected my files perfectly. I have hundreds more to process though. Is there anyway to run the script on the entire folder at once?
-
@Jeff-Michaels
Glad you got it to work! Unfortunately, PythonScript isn’t very well-equipped to manipulate the file system outside of Notepad++. You could in principle write a solution that opened every file in the directory in Notepad++, applied my script above, closed the file, and moved on to the next. However, that would be very slow compared to just writing a normal Python script that doesn’t use Notepad++ at all.Since my preferred solution to your larger problem doesn’t use Notepad++, I’ll turn to the chat.
-
@Mark-Olson Do you have a preferred way to chat or email? my email is jeff@sybersound.com
-
Just want to close by making a general note for posterity.
Out of curiosity, I tried using ChatGPT with Jeff’s problem, and it provided a solution that was suboptimal but serviceable nonetheless. And of course it was much much faster than me.
So I guess what I’d say is this:
If you’re looking for guidance on how to solve a problem that requires Notepad++, you’ve come to the right place. Not only that, I would very strenuously advise against trusting anything that ChatGPT says about Notepad++. However, if you don’t have any particular reason to believe that Notepad++ needs to be in the loop, ChatGPT might potentially be useful, as it excels at coming up with decent solutions to simple, routine problems, so long as you check its work. -
To keep the thread on-topic, I see that the OP has things working within Notepad++ and apparently has fixed some of his files using Notepad++ with Notepad++'s flavor of PythonScript.
The OP then asked if there is a way to do the script on all of the files in a folder. I believe it’s possible but don’t know the best practice for 1) How can someone open dozens, hundreds, or thousands of files in Notepad++? 2) How can someone then run a particular PythonScript from within Notepad++ on all open tabs?
To address #1 something I do is to construct a Notepad++ session file that has all of the files that I want to work on and then run Notepad++.exe with
-openSession filename
. While that works for me I don’t know if that’s the best practice. It may well be the OP is satisfied with runningNotepad.exe *.txt
from the command line and letting Notepad++'s build in wild card expansion take over. I just tried it on a folder with 963 files and … a minute later had 963 new tabs open. I was surprised that “close all tabs to the right” took about 45 seconds but it worked.I don’t know how to address #2 - Is it possible to run a PythonScript on all open tabs?
-
@mkupper said in Hi, looking for a regex that calculates distance between 2 numbers, then adjust the 2nd number to a minimum of 30.:
Is it possible to run a PythonScript on all open tabs?
No. What you would do is, in a single run of a single script, iterate over the open file tabs:
for (filename, bufferID, index, view) in notepad.getFiles(): notepad.activateFile(filename) ....
After a file is “activated”, then the
editor
object can manipulate the data of the activated tab, e.g., perhaps do aneditor.replace()
, etc.