• Login
Community
  • Login

Find the maximum line length in a file

Scheduled Pinned Locked Moved General Discussion
6 Posts 3 Posters 4.1k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • T
    Terry R
    last edited by Mar 17, 2021, 11:44 PM

    Not sure if anyone has a need for this and indeed maybe someone has already provided this solution. However a quick search on this forum (and Google) did not find it.

    It is a destructive test so should be carried out on a copy of the file being tested. It uses binary in helping to count characters on a line. This is beneficial as the binary notation can quickly count to a large number through doubling and is also a direct representation of the boolean true and false which I make use of in the Replacement expression.

    So to the regex:
    Find What:(?-s)(.{256})?(.{128})?(.{64})?(.{32})?(.{16})?(.{8})?(.{4})?(.{2})?(.{1})?
    Replace With:(?{1}1:0)(?{2}1:0)(?{3}1:0)(?{4}1:0)(?{5}1:0)(?{6}1:0)(?{7}1:0)(?{8}1:0)(?{9}1:0)

    So the above regex will only count a maximum of 511 characters on a line, however the method of creating each subexpression in both “Find What” and “Replace With” fields is self explanatory and should be easy to expand on. I haven’t tested (much) to see if there is a max size that can be accommodated within Notepad++ using this regex, although I have tried with 8192 (that’s 14 subexpressions) as the biggest number, the line WAS 16383 characters in length and was correctly counted.

    So after processing the file each line will have a binary number on it. Sorting these lines as “integer descending” will put the largest binary number on line 1. Copying this number and pasting into the “Conversion Panel” (under Plugins, Converter) Binary field will elicit the decimal number you want.

    So what can it be used for?

    1. Formatting text to be centred on a line. Find the longest line using this and then use other regexes (found on this forum) to pad out the lines in the file.
    2. A line number could be added to the front of each line, then this regex (adjusted) would accommodate that and add the binary number to the front. Sorting would show up ANY line numbers higher than a stipulated figure. The line number would then allow quick access to the “offending line” for editing. Yes, this can also be achieved with “Mark” and a test for any line longer than the figure stipulated.
    3. You supply other ideas???

    Currently I only see this as a step which would provide input to some of the following steps in a larger process.

    Terry

    A 1 Reply Last reply Mar 18, 2021, 12:40 AM Reply Quote 4
    • A
      Alan Kilborn @Terry R
      last edited by Mar 18, 2021, 12:40 AM

      @Terry-R

      Interesting regex.

      Of course, things like this always bring out alternative solutions, so here’s mine, a PythonScript solution:

      # -*- coding: utf-8 -*-
      
      from Npp import editor
      import re
      
      tab_size = 4
      max_len = 0
      for line in editor.getText().splitlines():
          line = line.rstrip()  # remove line-ending character(s)
          if '\t' in line: line = re.sub(r'\t', ' ' * tab_size, line)
          L = len(line)
          if L > max_len: max_len = L
      print(max_len)
      

      Not only do we have the length of the longest line in the file, we can optionally transform tab characters into their true length for the calculation. Of course we could add the line-endings into the calculation as well, but I didn’t do that.

      1 Reply Last reply Reply Quote 3
      • T
        Terry R
        last edited by Terry R Mar 18, 2021, 12:52 AM Mar 18, 2021, 12:50 AM

        @Alan-Kilborn said in Find the maximum line length in a file:

        Of course, things like this always bring out alternative solutions, so here’s mine, a PythonScript solution

        Of course the big brother (PythonScript) is always going to win the day. IF one knows how to program it!

        I suppose my discussion point wasn’t so much what can do it better than this, but “what could it be used for” or is it mostly redundant, merely a passing whimsy ;-))

        I had quite a while ago attempted to answer a post related to how many lines between delimiters (finding the max number I think) and was trying to get something like this to work but gave up. Possibly buffer limits were breached. It just came to me again today in this format so I thought, why not post it and see what attention it would get.

        Terry

        PS I suppose I need to state the obvious, this is within the regex world ONLY!

        A 1 Reply Last reply Mar 18, 2021, 12:58 AM Reply Quote 2
        • A
          Alan Kilborn @Terry R
          last edited by Mar 18, 2021, 12:58 AM

          @Terry-R said in Find the maximum line length in a file:

          IF one knows how to program it!

          Well, just like learning regex, one has to take that first step…
          Plus, the code I gave even includes some regex stuff within Python, to make the regexers feel at home. :-)

          1 Reply Last reply Reply Quote 2
          • N
            Nick Brown
            last edited by Mar 18, 2021, 9:42 AM

            Re: Find the maximum line length in a file

            Another Python Script version which makes use of one of the nice additions in the helper functions forEachLine, and gets the current tab size for the document rather than hard coding, and yes I did ‘borrow’ some of Alan’s script.

            maxLineLength = 0
            
            def getMaxLineLength(contents, lineNumber, totalLines):
                global maxLineLength
                tab_size = editor.getTabWidth()
                line = contents.rstrip()  # remove line-ending character(s)
                if '\t' in line: line = re.sub(r'\t', ' ' * tab_size, line)
                lineLength = len(line)
                if lineLength > maxLineLength: maxLineLength = lineLength
                
            editor.forEachLine(getMaxLineLength)
             
            print(maxLineLength)
            
            A 1 Reply Last reply Mar 18, 2021, 11:44 AM Reply Quote 4
            • A
              Alan Kilborn @Nick Brown
              last edited by Mar 18, 2021, 11:44 AM

              @Nick-Brown said in Find the maximum line length in a file:

              nice additions in the helper functions forEachLine

              When I first started using PS, I noticed some weirdness with forEachLine that of course I can’t remember these many years later, but since then I’ve steered clear of it. Perhaps I was doing something wrong with it, or maybe there truly was something wrong with it that has since been fixed.

              1 Reply Last reply Reply Quote 1
              6 out of 6
              • First post
                6/6
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors