**My question** How do I handle the extraction of the `.tex` file (as described below) to a flat `.csv` with format below ? **Posted here originally** [TeX_TA ](https://topanswers.xyz/tex?q=1521) **Context** I created all my MCQ with the `exam` package. However, exams are now 100% online... (sigh) My LaTeX file has the basic following format ``` \documentclass[12pt]{exam} \begin{document} \begin{questions} \question What is the answer ? \begin{oneparchoices} \choice 70 \choice 75 \choice 80 \CorrectChoice 85 \choice None of the above \end{oneparchoices} \question What is the answer to the second question ? \begin{oneparchoices} \choice 70 \choice 75 \CorrectChoice 80 \choice 85 \choice None of the above \end{oneparchoices} \end{questions} \end{document} ``` I need now to provide a csv where the questions of the MCQ above would be displayed like (`Incorrect`,`Correct`, just to be clear 😃 ) `question,answer1,Cor/Inc,answer2,Cor/Inc,answer3,Cor/Inc ,answer4,Cor/Inc,answer5,Cor/Inc` And it would render like `What is the answer ?,70,Inc,75,Inc,80,Inc,85,Cor,None of the above,Inc` Each line would obviously be a new question. What could correspond so far I found something interesting in python, but I am more open to a solution than a type of programming. I see the principle for environment between \begin and \end thanks to https://stackoverflow.com/questions/11054008/extract-figures-from-latex-file ``` infile = open('MCQ.tex', 'r') outfile = open('FlattenMCQ.csv', 'w') extract_block = False for line in infile: if 'begin{questions}' in line: extract_block = True if extract_block: outfile.write(line) if 'end{questions}' in line: extract_block = False outfile.write("------------------------------------------\n\n") infile.close() outfile.close() ``` Where I am stuck The recurisivity to test first `\begin{questions}` then `\question` then `\begin{oneparchoices}` then `\choice` or `\CorrectChoice`
This code's *very* messy, and only parses TeX in the very specific format you've provided, and won't always give you errors if the input document is “malformed”, but it should work: ```python import csv from functools import partial from itertools import takewhile def questions(lines): lines = iter(lines) for line in lines: if r'\begin{questions}' in line: break while True: for line in lines: if r'\end{questions}' in line: return q = line.split(r'\question', maxsplit=1) if len(q) == 2: question = q[1].strip() break if r'\begin{oneparchoices}' not in next(lines): raise ValueError(r"Expected \begin{oneparchoices}") yield question, tuple( (answer.strip(), r'Correct' in choice) for choice, answer in map( partial(str.split, maxsplit=1), takewhile( lambda line: r'\end{oneparchoices}' not in line, lines ) ) ) def q_flatten(questions): for question, answers in questions: yield (question,) + tuple(a_flatten(answers)) def a_flatten(answers): for answer, correct in answers: yield answer if correct: yield 'Cor' else: yield 'Inc' with open('MCQ.tex') as in_, \ open('FlattenMCQ.csv', 'w', newline='') as out: writer = csv.writer(out) # If you want a header, add it here. ##writer.writerow(("column1", "column2", "etc.")) writer.writerows(q_flatten(questions(in_))) ```
If you have `pip`, and can install the `TexSoup` package (pick one): ```sh > py -3 -m pip install TexSoup $ python3 -m pip install TexSoup $ python -m pip install TexSoup ``` then this would _probably_ be more resilient to TeX formatting changes, but needs to load the entire file into memory so wouldn't work for as large quizzes. ```python import csv from TexSoup import TexSoup def questions(tex): soup = TexSoup(tex) i = iter(soup[-1][0]) # \begin{questions} while True: try: next(i) # skip over \question except StopIteration: return question = next(i).strip() answers = tuple( (answer.strip(), choice.name == 'CorrectChoice') for choice, answer in zip(*(2*(iter(next(i)),))) ) yield question, answers # The below is the same as my other answer: def q_flatten(questions): for question, answers in questions: yield (question,) + tuple(a_flatten(answers)) def a_flatten(answers): for answer, correct in answers: yield answer if correct: yield 'Cor' else: yield 'Inc' with open('MCQ.tex') as in_, \ open('FlattenMCQ.csv', 'w', newline='') as out: writer = csv.writer(out) # If you want a header, add it here. ##writer.writerow(("column1", "column2", "etc.")) writer.writerows(q_flatten(questions(in_))) ```