add tag
5 years ago JeT

My question

How do I handle the extraction of the .tex file (as described below) to a flat .csv with format below ?

Posted here originally

TeX_TA

Context

I created all my MCQ with the exam package. However, exams are now 100% online… (sigh)

My LaTeX file has the basic following format

I need now to provide a csv where the questions of the MCQ above would be displayed like (Incorrect,Correct, just to be clear 😃 )

question,answer1,Cor/Inc,answer2,Cor/Inc,answer3,Cor/Inc ,answer4,Cor/Inc,answer5,Cor/Inc

And it would render like

What is the answer ?,70,Inc,75,Inc,80,Inc,85,Cor,None of the above,Inc

Each line would obviously be a new question.

What could correspond so far

I found something interesting in python, but I am more open to a solution than a type of programming.

I see the principle for environment between \begin and \end thanks to https://stackoverflow.com/questions/11054008/extract-figures-from-latex-file

Where I am stuck The recurisivity to test first \begin{questions} then \question then \begin{oneparchoices} then \choice or \CorrectChoice

Top Answer
5 years ago wizzwizz4

This code’s very messy, and only parses TeX in the very specific format you’ve provided, and won’t always give you errors if the input document is “malformed”, but it should work:

Answer #2
5 years ago wizzwizz4

If you have pip, and can install the TexSoup package (pick one):

then this would probably be more resilient to TeX formatting changes, but needs to load the entire file into memory so wouldn’t work for as large quizzes.

5 years
JeT replying to wizzwizz4 — Monday, 7th Dec 2020 22:03

I spent some time converting 350 MCQs with your code. It worked well. Thanks again !
However I get lost in TeXSoup since maths are not transcripted 😕

a day
JeT replying to wizzwizz4 — Sunday, 6th Dec 2020 23:14

TeXSoup raised my curiosity. I think the solution to including maths is here

an hour
JeT replying to wizzwizz4 — Sunday, 6th Dec 2020 21:52

Agreed, simple $\delta$ in TeX generates an error .

2 hours
JeT replying to wizzwizz4 — Sunday, 6th Dec 2020 20:11

I was too impatient… Just tested version#2 and it seems much more flexible. As long I have a \question I can have as many choices as I want/need and not care about blank lines. Great !
You know your method will apply to pretty much everything “itemized” as long as you define it . And that’s what most people use in their documents. The next question would then be a translation of TeX math (between $x$) to probably unicode so that a flat text file could keep the math format.

11 minutes
JeT replying to wizzwizz4 — Sunday, 6th Dec 2020 19:59

Just read about TeXSoup and latexwalker. I’m gonna check your second anwser tonight !

2 hours
wizzwizz4 — Sunday, 6th Dec 2020 18:02

re: my second answer, I feel like this will fall over if there’s TeX in a question or answer. It copes better with new lines, though.

2 hours
wizzwizz4 — Sunday, 6th Dec 2020 15:43

… I got distracted trying to find a version-agnostic bytecode assembler for optimising a single function.

an hour
wizzwizz4 — Sunday, 6th Dec 2020 14:55

And I think I’ve found a better way of doing it, so I’ll post another answer.

wizzwizz4 replying to JeT — Sunday, 6th Dec 2020 14:55

Done, thanks.

19 hours
JeT replying to wizzwizz4 — Saturday, 5th Dec 2020 19:35

You should publish your answer here ! It’s extremely useful and it could be generalized to other environments (itemize, enumerate for instance). If you have an answer for nested environment, you get something powerful for TeXusers.

JeT replying to wizzwizz4 — Saturday, 5th Dec 2020 19:33

It works perfectly ! Merci ! I learnt a lot with your help 😃

2 hours
wizzwizz4 replying to JeT — Saturday, 5th Dec 2020 17:26

Should be fixed now.

wizzwizz4 — Saturday, 5th Dec 2020 17:25

Oh. Didn’t strip the answers. 😒

wizzwizz4 — Saturday, 5th Dec 2020 17:25

Does question = q[1].strip() not strip new lines, then?

wizzwizz4 replying to JeT — Saturday, 5th Dec 2020 17:24

Hmm… I thought I’d fixed that.

4 hours
JeT replying to wizzwizz4 — Saturday, 5th Dec 2020 13:08

OK for the maths. As a beginner in python, i’ll add complexity step by step 😃
I don’t agree with you, your code is extremely clear ! I have to say you gave me ideas. (like put all questions and answers in a proper df . It would enable me to index them and manipulate the number of questions directly from python).
Probably a stupid question regarding the ouptut. I get

but would it difficult to get (I line per question and answers)

?

33 minutes
wizzwizz4 — Saturday, 5th Dec 2020 12:36

If you ever want to do that “replace maths” thing, write a function to do it to a string (probably with the re module, or str.replace), then call it on question in line 2 of q_flatten and answer on line 2 of a_flatten.

JeT replying to wizzwizz4 — Saturday, 5th Dec 2020 12:33

Your assuptions are correct ! (It actually shows me I should split my files in themes).
I am testing your code right now and it works super well!
I’ll run other tests with $\theta$ or other symbols (I use many of them in my field). and get back to you if any question. Merci wizzwizz4 😃

12 minutes
wizzwizz4 — Saturday, 5th Dec 2020 12:21

My program’s atrocious (using advanced tricks like partial to stop me having to re-order my thoughts) and makes massive assumptions (e.g. there is only one \begin{questions} block, there are no blank lines within a \begin{oneparchoices} block, nothing is ever nested), but it worked when I tested it.

37 minutes
JeT replying to wizzwizz4 — Saturday, 5th Dec 2020 11:44

Of course 😃 Question updated. My document has 500 such questions one after the other.
I imagine for someone good at python, It could actually be parsed easily into a structured df.

29 minutes
wizzwizz4 replying to JeT — Saturday, 5th Dec 2020 11:15

Could you provide an example with two questions, please?

18 hours
JeT — Friday, 4th Dec 2020 16:58

I put a MWE with one question, but to give you context, I have 500 questions like this in the same file

Enter question or answer id or url (and optionally further answer ids/urls from the same question) from

Separate each id/url with a space. No need to list your own answers; they will be imported automatically.