Community
    • Login

    Find the maximum line length in a file

    Scheduled Pinned Locked Moved General Discussion
    6 Posts 3 Posters 4.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Terry RT
      Terry R
      last edited by

      Not sure if anyone has a need for this and indeed maybe someone has already provided this solution. However a quick search on this forum (and Google) did not find it.

      It is a destructive test so should be carried out on a copy of the file being tested. It uses binary in helping to count characters on a line. This is beneficial as the binary notation can quickly count to a large number through doubling and is also a direct representation of the boolean true and false which I make use of in the Replacement expression.

      So to the regex:
      Find What:(?-s)(.{256})?(.{128})?(.{64})?(.{32})?(.{16})?(.{8})?(.{4})?(.{2})?(.{1})?
      Replace With:(?{1}1:0)(?{2}1:0)(?{3}1:0)(?{4}1:0)(?{5}1:0)(?{6}1:0)(?{7}1:0)(?{8}1:0)(?{9}1:0)

      So the above regex will only count a maximum of 511 characters on a line, however the method of creating each subexpression in both “Find What” and “Replace With” fields is self explanatory and should be easy to expand on. I haven’t tested (much) to see if there is a max size that can be accommodated within Notepad++ using this regex, although I have tried with 8192 (that’s 14 subexpressions) as the biggest number, the line WAS 16383 characters in length and was correctly counted.

      So after processing the file each line will have a binary number on it. Sorting these lines as “integer descending” will put the largest binary number on line 1. Copying this number and pasting into the “Conversion Panel” (under Plugins, Converter) Binary field will elicit the decimal number you want.

      So what can it be used for?

      1. Formatting text to be centred on a line. Find the longest line using this and then use other regexes (found on this forum) to pad out the lines in the file.
      2. A line number could be added to the front of each line, then this regex (adjusted) would accommodate that and add the binary number to the front. Sorting would show up ANY line numbers higher than a stipulated figure. The line number would then allow quick access to the “offending line” for editing. Yes, this can also be achieved with “Mark” and a test for any line longer than the figure stipulated.
      3. You supply other ideas???

      Currently I only see this as a step which would provide input to some of the following steps in a larger process.

      Terry

      Alan KilbornA 1 Reply Last reply Reply Quote 4
      • Alan KilbornA
        Alan Kilborn @Terry R
        last edited by

        @Terry-R

        Interesting regex.

        Of course, things like this always bring out alternative solutions, so here’s mine, a PythonScript solution:

        # -*- coding: utf-8 -*-
        
        from Npp import editor
        import re
        
        tab_size = 4
        max_len = 0
        for line in editor.getText().splitlines():
            line = line.rstrip()  # remove line-ending character(s)
            if '\t' in line: line = re.sub(r'\t', ' ' * tab_size, line)
            L = len(line)
            if L > max_len: max_len = L
        print(max_len)
        

        Not only do we have the length of the longest line in the file, we can optionally transform tab characters into their true length for the calculation. Of course we could add the line-endings into the calculation as well, but I didn’t do that.

        1 Reply Last reply Reply Quote 3
        • Terry RT
          Terry R
          last edited by Terry R

          @Alan-Kilborn said in Find the maximum line length in a file:

          Of course, things like this always bring out alternative solutions, so here’s mine, a PythonScript solution

          Of course the big brother (PythonScript) is always going to win the day. IF one knows how to program it!

          I suppose my discussion point wasn’t so much what can do it better than this, but “what could it be used for” or is it mostly redundant, merely a passing whimsy ;-))

          I had quite a while ago attempted to answer a post related to how many lines between delimiters (finding the max number I think) and was trying to get something like this to work but gave up. Possibly buffer limits were breached. It just came to me again today in this format so I thought, why not post it and see what attention it would get.

          Terry

          PS I suppose I need to state the obvious, this is within the regex world ONLY!

          Alan KilbornA 1 Reply Last reply Reply Quote 2
          • Alan KilbornA
            Alan Kilborn @Terry R
            last edited by

            @Terry-R said in Find the maximum line length in a file:

            IF one knows how to program it!

            Well, just like learning regex, one has to take that first step…
            Plus, the code I gave even includes some regex stuff within Python, to make the regexers feel at home. :-)

            1 Reply Last reply Reply Quote 2
            • Nick BrownN
              Nick Brown
              last edited by

              Re: Find the maximum line length in a file

              Another Python Script version which makes use of one of the nice additions in the helper functions forEachLine, and gets the current tab size for the document rather than hard coding, and yes I did ‘borrow’ some of Alan’s script.

              maxLineLength = 0
              
              def getMaxLineLength(contents, lineNumber, totalLines):
                  global maxLineLength
                  tab_size = editor.getTabWidth()
                  line = contents.rstrip()  # remove line-ending character(s)
                  if '\t' in line: line = re.sub(r'\t', ' ' * tab_size, line)
                  lineLength = len(line)
                  if lineLength > maxLineLength: maxLineLength = lineLength
                  
              editor.forEachLine(getMaxLineLength)
               
              print(maxLineLength)
              
              Alan KilbornA 1 Reply Last reply Reply Quote 4
              • Alan KilbornA
                Alan Kilborn @Nick Brown
                last edited by

                @Nick-Brown said in Find the maximum line length in a file:

                nice additions in the helper functions forEachLine

                When I first started using PS, I noticed some weirdness with forEachLine that of course I can’t remember these many years later, but since then I’ve steered clear of it. Perhaps I was doing something wrong with it, or maybe there truly was something wrong with it that has since been fixed.

                1 Reply Last reply Reply Quote 1
                • First post
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors