Here’s my PythonScript script, which can output a CSV or TSV file while ensuring that each row has the right number of columns and any instances of the column separator inside a column are handled correctly.
'''
====== SOURCE ======
Requires PythonScript (https://github.com/bruderstein/PythonScript/releases)
Based on this question: https://community.notepad-plus-plus.org/topic/25962/separate-question-and-answer-about-1000-topics
====== DESCRIPTION ======
Converts a list of questions in the following format into a RFC-4180 compliant CSV file (in other words, a CSV file that is designed to be easy for lots of applications to read)
====== EXAMPLE ======
Assume that you have the text below (between the ------------ lines):
------------
1.1.1 This is question number 1?
"This is answer" 1 with option 1
This, is answer 1, with option 2
1.1.2 This is question number 2?
This is answer 2, with "option 1"
This is answer 2 with option 2
This is answer 2 with option 3
1.1.3 This is question "number 3"?
This is answer 3 with option 1
1.1.4 This is question, number 4?
This is answer 4 with option 1
This is answer 4, with option 2
This is answer 4 with "option" 3
This is answer 4 with option 4
------------
This script will output the following CSV file (between the ------------ lines)
------------
question,option 1,option 2,option 3,option 4
1.1.1 This is question number 1?,"""This is answer"" 1 with option 1","This, is answer 1, with option 2",,
1.1.2 This is question number 2?,"This is answer 2, with ""option 1""",This is answer 2 with option 2,This is answer 2 with option 3,
"1.1.3 This is question ""number 3""?",This is answer 3 with option 1,,,
"1.1.4 This is question, number 4?",This is answer 4 with option 1,"This is answer 4, with option 2","This is answer 4 with ""option"" 3",This is answer 4 with option 4
------------
'''
from Npp import editor, notepad
import json
def convert_q_list_to_csv_main():
# this is set to ',' to make a CSV file.
# you could instead use '\t' if you wanted a TSV (tab-separated variables) file
SEP = ','
question_lines = []
def to_RFC_4180(s: str, sep: str) -> str:
if '"' in s or sep in s or '\r' in s or '\n' in s:
return '"' + s.replace('"', '""') + '"'
return s
editor.research(r"((?'question'^\d+\.\d+\.\d+ +(.*\?)$))(?:\R(?!(?&question)).*)+",
lambda m: question_lines.append(m.group(0).splitlines()))
print(json.dumps(question_lines, indent=4))
max_n_options = max(len(x) for x in question_lines)
header_text = 'question' + SEP + SEP.join('option %d' % ii for ii in range(1, max_n_options))
out_line_texts = [header_text]
for question in question_lines:
RFC_4180_texts = []
for ii in range(max_n_options):
if ii >= len(question):
RFC_4180_texts.append('')
else:
RFC_4180_texts.append(to_RFC_4180(question[ii], SEP))
out_line_texts.append(SEP.join(RFC_4180_texts))
notepad.new()
editor.setText('\r\n'.join(out_line_texts))
if __name__ == '__main__':
convert_q_list_to_csv_main()
del convert_q_list_to_csv_main
Before you ask, I made the odd programmatic choice to define helper functions and global constants inside the main function to avoid polluting the global PythonScript namespace.