Find the maximum line length in a file
Terry R last edited by
Not sure if anyone has a need for this and indeed maybe someone has already provided this solution. However a quick search on this forum (and Google) did not find it.
It is a destructive test so should be carried out on a copy of the file being tested. It uses binary in helping to count characters on a line. This is beneficial as the binary notation can quickly count to a large number through doubling and is also a direct representation of the boolean
falsewhich I make use of in the Replacement expression.
So to the regex:
So the above regex will only count a maximum of 511 characters on a line, however the method of creating each subexpression in both “Find What” and “Replace With” fields is self explanatory and should be easy to expand on. I haven’t tested (much) to see if there is a max size that can be accommodated within Notepad++ using this regex, although I have tried with
8192(that’s 14 subexpressions) as the biggest number, the line WAS 16383 characters in length and was correctly counted.
So after processing the file each line will have a binary number on it. Sorting these lines as “integer descending” will put the largest binary number on line 1. Copying this number and pasting into the “Conversion Panel” (under Plugins, Converter) Binary field will elicit the decimal number you want.
So what can it be used for?
- Formatting text to be centred on a line. Find the longest line using this and then use other regexes (found on this forum) to pad out the lines in the file.
- A line number could be added to the front of each line, then this regex (adjusted) would accommodate that and add the binary number to the front. Sorting would show up ANY line numbers higher than a stipulated figure. The line number would then allow quick access to the “offending line” for editing. Yes, this can also be achieved with “Mark” and a test for any line longer than the figure stipulated.
- You supply other ideas???
Currently I only see this as a step which would provide input to some of the following steps in a larger process.
Of course, things like this always bring out alternative solutions, so here’s mine, a PythonScript solution:
# -*- coding: utf-8 -*- from Npp import editor import re tab_size = 4 max_len = 0 for line in editor.getText().splitlines(): line = line.rstrip() # remove line-ending character(s) if '\t' in line: line = re.sub(r'\t', ' ' * tab_size, line) L = len(line) if L > max_len: max_len = L print(max_len)
Not only do we have the length of the longest line in the file, we can optionally transform tab characters into their true length for the calculation. Of course we could add the line-endings into the calculation as well, but I didn’t do that.
Terry R last edited by Terry R
Of course, things like this always bring out alternative solutions, so here’s mine, a PythonScript solution
Of course the big brother (PythonScript) is always going to win the day. IF one knows how to program it!
I suppose my discussion point wasn’t so much what can do it better than this, but “what could it be used for” or is it mostly redundant, merely a passing whimsy ;-))
I had quite a while ago attempted to answer a post related to how many lines between delimiters (finding the max number I think) and was trying to get something like this to work but gave up. Possibly buffer limits were breached. It just came to me again today in this format so I thought, why not post it and see what attention it would get.
PS I suppose I need to state the obvious, this is within the regex world ONLY!
IF one knows how to program it!
Well, just like learning regex, one has to take that first step…
Plus, the code I gave even includes some regex stuff within Python, to make the regexers feel at home. :-)
Nick Brown last edited by
Another Python Script version which makes use of one of the nice additions in the helper functions forEachLine, and gets the current tab size for the document rather than hard coding, and yes I did ‘borrow’ some of Alan’s script.
maxLineLength = 0 def getMaxLineLength(contents, lineNumber, totalLines): global maxLineLength tab_size = editor.getTabWidth() line = contents.rstrip() # remove line-ending character(s) if '\t' in line: line = re.sub(r'\t', ' ' * tab_size, line) lineLength = len(line) if lineLength > maxLineLength: maxLineLength = lineLength editor.forEachLine(getMaxLineLength) print(maxLineLength)
nice additions in the helper functions forEachLine
When I first started using PS, I noticed some weirdness with
forEachLinethat of course I can’t remember these many years later, but since then I’ve steered clear of it. Perhaps I was doing something wrong with it, or maybe there truly was something wrong with it that has since been fixed.