Community
    • 登入

    Splitting a text file into multiple text files at every blank line

    已排程 已置頂 已鎖定 已移動 Help wanted · · · – – – · · ·
    3 貼文 3 Posters 10.5k 瀏覽
    正在載入更多貼文
    • 從舊到新
    • 從新到舊
    • 最多點贊
    回覆
    • 在新貼文中回覆
    登入後回覆
    此主題已被刪除。只有擁有主題管理權限的使用者可以查看。
    • Davey ClarkeD
      Davey Clarke
      最後由 編輯

      I have a large text file (190,000 lines) that contains data points for each farm field (its agricultural data) each field is split by a blank line. Ideally I would split this file into loads of files so they just contain the data from a single field.

      So, in short i want the file to be split into a new one every time there is a blank line.

      Cheers.

      Scott SumnerS 1 條回覆 最後回覆 回覆 引用 0
      • Jacob CurrieJ
        Jacob Currie
        最後由 編輯

        Hm not sure what your looking for within NPP, If I was you id grab my favorite language and loop through your data with a separate script.

        Wont write anything for ya but here is some pseudo if you want to give it a try…

        original = get your file
        fileLineReader = new reader(original)
        curIndex = 2
        filename = “Farm1.txt”
        file = create filename

        while (get next ‘currentLine’ of fileLineReader) {
        if (currentline isnt blank line) {
        ///append to current file
        file.append(currentLine
        } else {
        ////generate new file
        filename = “Farm” + curIndex + “.txt”
        file = create filename
        curIndex += 1
        }
        }

        Id use powershell or batch, should only take a little bit of time to google the correct.
        Once it hits a blank line, the file its writing to changes. Farm1 - Farm2,3,4,5,6…

        1 條回覆 最後回覆 回覆 引用 0
        • Scott SumnerS
          Scott Sumner @Davey Clarke
          最後由 編輯

          @Davey-Clarke

          I don’t know if you are using 32-bit Notepad++ or the Pythonscript plugin, but if you’re willing to do both then the following script will do the job. When run while the desired file to be split (e.g., …\myfile.txt) is active, it will produce 2+ related files containing the post-split data (e.g., …\myfile_1.txt, …\myfile_2.txt, etc).

          I named this script SplitCurrentFileByBlankLine.py:

          import os
          import math
          
          def SCFBBL__main():
          
              pathname = notepad.getCurrentFilename()
              if pathname.lower().startswith('new '): return  # must have a real file on disk
          
              #line_delim_regex = r'^\h*\R'  # match truly empty lines OR lines containing only whitespace
              line_delim_regex = r'^\R'  # match truly empty lines ONLY
          
              match_span_tuple_list = []
          
              def match_found(m):
                  if m.start(1) != -1:
                      # delimiter starts the file, followed by non-delimiter data, plus another delimiter  (will get at most one match of this type)
                      match_span_tuple_list.append(m.span(1))
                  elif m.start(4) != -1:
                      # mid-file data plus delimiter (most matches will be of this type)
                      match_span_tuple_list.append(m.span(4))
                  elif m.start(7) != -1:
                      # end of file where no delimiter follows data (will get at most one match of this type)
                      match_span_tuple_list.append(m.span(7))
          
              editor.research(r'(?s)(?:(?:{D})+(?<g1>.+?)(?<g2>(?<g3>{D})+))' \
                  '|' \
                  '(?:(?<g4>.+?)(?<g5>(?<g6>{D})+))' \
                  '|' \
                  '(?<g7>.+?\z)'.format(D=line_delim_regex), match_found)
          
              num_files_to_create = len(match_span_tuple_list)
              if num_files_to_create < 2: return  # no need to split anything
              if num_files_to_create > 10:  # warn user if large # of files is going to be created
                  answer = notepad.messageBox('There will be {} files created.\r\n\r\nCONTINUE ?'.format(num_files_to_create), '', MESSAGEBOXFLAGS.YESNO | MESSAGEBOXFLAGS.DEFBUTTON2)
                  if answer != MESSAGEBOXFLAGS.RESULTYES: return
          
              (path_part, file_part) = pathname.rsplit(os.sep, 1)
              file_without_dot_ext = file_part; ext_wo_dot = ''
              try:
                  (file_without_dot_ext, ext_wo_dot) = file_part.rsplit('.', 1)
              except ValueError: pass
              num_digits = int(math.log(num_files_to_create, 10)) + 1
              out_file_path_str_format = '{base}_{{:0{d}}}'.format(base=file_without_dot_ext, d=num_digits)
              if len(ext_wo_dot) > 0: out_file_path_str_format += '.' + ext_wo_dot
              out_file_path_str_format = path_part + os.sep + out_file_path_str_format
          
              for (index, (match_start_pos, match_end_pos)) in enumerate(match_span_tuple_list):
                  with open(out_file_path_str_format.format(index), 'wb') as f:
                      f.write(editor.getTextRange(match_start_pos, match_end_pos))
          
          SCFBBL__main()
          
          1 條回覆 最後回覆 回覆 引用 1
          • 第一個貼文
            最後的貼文
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors