tanmmayyy commited on
Commit
a50befe
Β·
1 Parent(s): c0a212c

for deployment

Browse files
.devcontainer/devcontainer.json DELETED
@@ -1,33 +0,0 @@
1
- {
2
- "name": "Python 3",
3
- // Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile
4
- "image": "mcr.microsoft.com/devcontainers/python:1-3.11-bookworm",
5
- "customizations": {
6
- "codespaces": {
7
- "openFiles": [
8
- "README.md",
9
- "app/main.py"
10
- ]
11
- },
12
- "vscode": {
13
- "settings": {},
14
- "extensions": [
15
- "ms-python.python",
16
- "ms-python.vscode-pylance"
17
- ]
18
- }
19
- },
20
- "updateContentCommand": "[ -f packages.txt ] && sudo apt update && sudo apt upgrade -y && sudo xargs apt install -y <packages.txt; [ -f requirements.txt ] && pip3 install --user -r requirements.txt; pip3 install --user streamlit; echo 'βœ… Packages installed and Requirements met'",
21
- "postAttachCommand": {
22
- "server": "streamlit run app/main.py --server.enableCORS false --server.enableXsrfProtection false"
23
- },
24
- "portsAttributes": {
25
- "8501": {
26
- "label": "Application",
27
- "onAutoForward": "openPreview"
28
- }
29
- },
30
- "forwardPorts": [
31
- 8501
32
- ]
33
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.gitignore CHANGED
@@ -22,4 +22,6 @@ __pycache__/
22
  Thumbs.db
23
 
24
  # Jupyter checkpoints
25
- .ipynb_checkpoints/
 
 
 
22
  Thumbs.db
23
 
24
  # Jupyter checkpoints
25
+ .ipynb_checkpoints/
26
+
27
+ .env
.python_version DELETED
@@ -1 +0,0 @@
1
- 3.11
 
 
.streamlit/config.toml DELETED
@@ -1,6 +0,0 @@
1
- [server]
2
- headless = true
3
- port = 8501
4
-
5
- [theme]
6
- base = "light"
 
 
 
 
 
 
 
README.md CHANGED
@@ -1,10 +1,549 @@
 
 
 
 
 
 
1
  ---
2
- title: MCQ Generator
3
- emoji: πŸ“
4
- colorFrom: blue
5
- colorTo: purple
6
- sdk: streamlit
7
- sdk_version: 1.33.0
8
- app_file: app/main.py
9
- pinned: false
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸ“ MCQ Generator β€” Automatic Multiple Choice Question Generator
2
+
3
+ > **An end-to-end NLP pipeline that reads any text passage and automatically generates a complete multiple-choice quiz with scoring and explanations.**
4
+
5
+ Built as a course project for an NLP curriculum covering Modules I–IV: tokenization, word embeddings, transformers, and natural language generation.
6
+
7
  ---
8
+
9
+ ## πŸ“Œ Table of Contents
10
+
11
+ 1. [What This Project Does](#what-this-project-does)
12
+ 2. [Live Demo](#live-demo)
13
+ 3. [How It Works β€” The Full Pipeline](#how-it-works--the-full-pipeline)
14
+ 4. [NLP Techniques Used](#nlp-techniques-used)
15
+ 5. [Project Structure](#project-structure)
16
+ 6. [Each File Explained](#each-file-explained)
17
+ 7. [Tech Stack](#tech-stack)
18
+ 8. [Setup & Installation](#setup--installation)
19
+ 9. [Running the App](#running-the-app)
20
+ 10. [Testing Each Module](#testing-each-module)
21
+ 11. [Sample Output](#sample-output)
22
+ 12. [What Makes a Good Passage](#what-makes-a-good-passage)
23
+ 13. [Known Limitations](#known-limitations)
24
+ 14. [Future Work](#future-work)
25
+ 15. [Related Research](#related-research)
26
+ 16. [Course Outcomes Covered](#course-outcomes-covered)
27
+
28
+ ---
29
+
30
+ ## What This Project Does
31
+
32
+ Given any factual text passage, this system:
33
+
34
+ 1. **Extracts** the most important sentences using TF-IDF ranking
35
+ 2. **Identifies** answer candidates using Named Entity Recognition (NER)
36
+ 3. **Generates** natural language questions using a T5 transformer model
37
+ 4. **Creates** plausible wrong options (distractors) using WordNet and NER
38
+ 5. **Presents** an interactive quiz with scoring and per-question explanations
39
+
40
+ **Example:**
41
+
42
+ Input passage:
43
+ ```
44
+ Albert Einstein was born on March 14, 1879, in Ulm, Germany.
45
+ He was awarded the Nobel Prize in Physics in 1921 for his
46
+ discovery of the photoelectric effect.
47
+ ```
48
+
49
+ Generated MCQ:
50
+ ```
51
+ Q: Where was Albert Einstein born?
52
+
53
+ A. France
54
+ B. Germany βœ“
55
+ C. United States
56
+ D. Switzerland
57
+ ```
58
+
59
+ ---
60
+
61
+ ## Live Demo
62
+
63
+ ```bash
64
+ streamlit run app/main.py
65
+ ```
66
+
67
+ Opens at `http://localhost:8501` in your browser.
68
+
69
+ ---
70
+
71
+ ## How It Works β€” The Full Pipeline
72
+
73
+ ```
74
+ Raw Text Passage
75
+ β”‚
76
+ β–Ό
77
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
78
+ β”‚ STEP 1: PREPROCESSING (preprocessor.py) β”‚
79
+ β”‚ β”‚
80
+ β”‚ β€’ Split into sentences (spaCy) β”‚
81
+ β”‚ β€’ Rank by TF-IDF score (scikit-learn) β”‚
82
+ β”‚ β€’ Extract Named Entities (spaCy NER) β”‚
83
+ β”‚ β€’ Filter answer candidates (blacklist) β”‚
84
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
85
+ β”‚ top sentences + answer candidates
86
+ β–Ό
87
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
88
+ β”‚ STEP 2: QUESTION GENERATION β”‚
89
+ β”‚ (question_generator.py) β”‚
90
+ β”‚ β”‚
91
+ β”‚ β€’ Highlight answer in sentence with <hl> β”‚
92
+ β”‚ β€’ Feed to T5 transformer model β”‚
93
+ β”‚ β€’ Generate 3 candidate questions β”‚
94
+ β”‚ β€’ Validate: reject circular/vague Qs β”‚
95
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
96
+ β”‚ (question, answer) pairs
97
+ β–Ό
98
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
99
+ β”‚ STEP 3: DISTRACTOR GENERATION β”‚
100
+ β”‚ (distractor_generator.py) β”‚
101
+ β”‚ β”‚
102
+ β”‚ Strategy 1: Same-type NER entities β”‚
103
+ β”‚ from the passage β”‚
104
+ β”‚ Strategy 2: WordNet hyponym siblings β”‚
105
+ β”‚ Strategy 3: Cross-label fallback β”‚
106
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
107
+ β”‚ 3 wrong options per question
108
+ β–Ό
109
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
110
+ β”‚ STEP 4: MCQ ASSEMBLY + VALIDATION β”‚
111
+ β”‚ (mcq_builder.py) β”‚
112
+ β”‚ β”‚
113
+ β”‚ β€’ Combine answer + distractors β”‚
114
+ β”‚ β€’ Shuffle options randomly β”‚
115
+ β”‚ β€’ Quality gate: dedup, similarity check β”‚
116
+ β”‚ β€’ Return list of MCQ objects β”‚
117
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€οΏ½οΏ½οΏ½β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
118
+ β”‚ validated MCQ list
119
+ β–Ό
120
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
121
+ β”‚ STEP 5: QUIZ UI + SCORING β”‚
122
+ β”‚ (app/main.py + evaluator.py) β”‚
123
+ β”‚ β”‚
124
+ β”‚ β€’ Streamlit 3-screen app β”‚
125
+ β”‚ β€’ Input β†’ Quiz β†’ Results β”‚
126
+ β”‚ β€’ Score, feedback, explanations β”‚
127
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
128
+ ```
129
+
130
+ ---
131
+
132
+ ## NLP Techniques Used
133
+
134
+ ### Module I β€” Foundational NLP
135
+ | Technique | Where Used | Purpose |
136
+ |---|---|---|
137
+ | Tokenization | `preprocessor.py` | Split text into sentences and tokens using spaCy |
138
+ | Lemmatization | `preprocessor.py` | Normalize word forms for TF-IDF |
139
+ | Stop word removal | `preprocessor.py` | Filter noise before TF-IDF scoring |
140
+ | Named Entity Recognition (NER) | `preprocessor.py` | Find PERSON, ORG, DATE, GPE as answer candidates |
141
+ | POS Tagging | `preprocessor.py` | Identify nouns and proper nouns |
142
+ | WordNet | `distractor_generator.py` | Find semantically related words as distractors |
143
+ | Synsets / Hyponyms | `distractor_generator.py` | Navigate WordNet hierarchy for same-category words |
144
+
145
+ ### Module II β€” Word Representation
146
+ | Technique | Where Used | Purpose |
147
+ |---|---|---|
148
+ | TF-IDF | `preprocessor.py` | Rank sentences by information density |
149
+ | Word Embeddings (GloVe) | `distractor_generator.py` | Optional cosine-similarity based distractor finding |
150
+
151
+ **TF-IDF explained:**
152
+ - **TF (Term Frequency)** = how often a word appears in *this* sentence
153
+ - **IDF (Inverse Document Frequency)** = how rare the word is across *all* sentences
154
+ - High TF-IDF score = sentence contains rare, informative words β†’ good question source
155
+
156
+ ### Module III β€” Deep Learning for NLP
157
+ | Technique | Where Used | Purpose |
158
+ |---|---|---|
159
+ | Transformers | `question_generator.py` | T5 model for question generation |
160
+ | Transfer Learning | `question_generator.py` | Using pre-trained T5 fine-tuned on SQuAD |
161
+ | Seq2Seq | `question_generator.py` | Encoder-decoder architecture of T5 |
162
+ | Beam Search | `question_generator.py` | Generate multiple question candidates, pick best |
163
+
164
+ ### Module IV β€” Advanced NLP
165
+ | Technique | Where Used | Purpose |
166
+ |---|---|---|
167
+ | T5 (Text-to-Text Transfer Transformer) | `question_generator.py` | State-of-the-art QG model |
168
+ | Natural Language Generation (NLG) | `question_generator.py` | Generating grammatical questions |
169
+ | Subword Tokenization (SentencePiece) | `question_generator.py` | T5's tokenizer handles rare/unknown words |
170
+ | Pre-trained Models | `question_generator.py` | `valhalla/t5-small-qg-hl` from HuggingFace |
171
+
172
+ ---
173
+
174
+ ## Project Structure
175
+
176
+ ```
177
+ mcq_generator/
178
+ β”‚
179
+ β”œβ”€β”€ src/ # Core NLP pipeline modules
180
+ β”‚ β”œβ”€β”€ __init__.py
181
+ β”‚ β”œβ”€β”€ preprocessor.py # Text cleaning, TF-IDF, NER, answer extraction
182
+ β”‚ β”œβ”€β”€ question_generator.py # T5-based question generation
183
+ β”‚ β”œβ”€β”€ distractor_generator.py # WordNet + NER distractor generation
184
+ β”‚ β”œβ”€β”€ mcq_builder.py # Pipeline orchestrator + MCQ dataclass
185
+ β”‚ └── evaluator.py # Answer checking and scoring
186
+ β”‚
187
+ β”œβ”€β”€ app/ # Streamlit web application
188
+ β”‚ β”œβ”€β”€ __init__.py
189
+ β”‚ β”œβ”€β”€ main.py # 3-screen app: input β†’ quiz β†’ results
190
+ β”‚ └── components.py # Reusable UI components
191
+ β”‚
192
+ β”œβ”€β”€ data/
193
+ β”‚ └── sample_passages.json # 5 test passages (ISRO, Gandhi, AI, etc.)
194
+ β”‚
195
+ β”œβ”€β”€ models/ # (gitignored) Downloaded model files
196
+ β”‚ └── README.md
197
+ β”‚
198
+ β”œβ”€β”€ notebooks/ # Jupyter notebooks for exploration
199
+ β”‚
200
+ β”œβ”€β”€ config.py # All settings in one place
201
+ β”œβ”€β”€ requirements.txt # Python dependencies
202
+ └── README.md # This file
203
+ ```
204
+
205
+ ---
206
+
207
+ ## Each File Explained
208
+
209
+ ### `config.py`
210
+ Central settings file. Every other module imports from here.
211
+ - Model name, number of questions, sentence count, file paths
212
+ - Change values here to tune the entire system without touching logic files
213
+
214
+ ### `src/preprocessor.py`
215
+ The NLP foundation of the project.
216
+
217
+ **Key functions:**
218
+ - `extract_sentences(text)` β€” spaCy sentence boundary detection
219
+ - `rank_sentences(sentences)` β€” TF-IDF scoring, returns top N most informative sentences
220
+ - `extract_answer_candidates(sentence)` β€” NER-based extraction with strict quality filters
221
+ - `preprocess(text)` β€” full pipeline, returns structured dict
222
+
223
+ **Design decisions:**
224
+ - Only `PERSON`, `ORG`, `GPE`, `DATE`, `EVENT`, `WORK_OF_ART` NER labels are accepted as answers
225
+ - A `BLACKLIST` of 30+ generic words ("annual", "various", "Moon") prevents trivial answers
226
+ - Answers are sorted by priority: PERSON > ORG/GPE > DATE > others
227
+
228
+ ### `src/question_generator.py`
229
+ Uses the `valhalla/t5-small-qg-hl` model β€” a T5-small fine-tuned on SQuAD for question generation.
230
+
231
+ **How T5 QG works:**
232
+ ```
233
+ Input: "generate question: ISRO was founded in <hl> 1969 <hl> by Vikram Sarabhai."
234
+ Output: "In what year was ISRO founded?"
235
+ ```
236
+
237
+ **Key functions:**
238
+ - `highlight_answer(sentence, answer)` β€” wraps answer in `<hl>` tags
239
+ - `generate_question(sentence, answer)` β€” beam search with 5 beams, 3 candidates
240
+ - `answer_is_addressable(question, answer)` β€” rejects circular, vague, or short questions
241
+
242
+ **Quality filters applied:**
243
+ - Must start with a question word (what/who/when/where/which/how)
244
+ - Answer must NOT appear in the question
245
+ - Abbreviation trap detection (e.g. rejects Q: "What does ISRO stand for?" when A is the full name)
246
+ - Minimum 5 words
247
+
248
+ ### `src/distractor_generator.py`
249
+ Generates 3 plausible wrong answer options. Uses a priority-based strategy chain.
250
+
251
+ **Strategy 1 β€” Same-label NER (best):**
252
+ Finds other entities of the same NER type from the passage.
253
+ ```
254
+ Answer: "1969" (DATE) β†’ Distractors: ["1975", "2008", "2023"] (other DATEs in passage)
255
+ Answer: "Vikram Sarabhai" (PERSON) β†’ Distractors: ["Kalam", "Dhawan", "Nehru"]
256
+ ```
257
+
258
+ **Strategy 2 β€” WordNet hyponyms:**
259
+ Navigates the WordNet hierarchy to find sibling words in the same semantic category.
260
+ ```
261
+ Answer: "India" β†’ hypernym: "country" β†’ hyponyms: ["China", "Brazil", "Pakistan"]
262
+ ```
263
+
264
+ **Strategy 3 β€” Cross-label fallback:**
265
+ Uses any other named entity from the passage if strategies 1 and 2 fail.
266
+
267
+ ### `src/mcq_builder.py`
268
+ The single entry point that the UI calls. Orchestrates the entire pipeline.
269
+
270
+ **MCQ dataclass:**
271
+ ```python
272
+ @dataclass
273
+ class MCQ:
274
+ question : str
275
+ options : list # 4 shuffled options
276
+ correct_index : int # index of correct answer (0-3)
277
+ correct_answer : str
278
+ explanation : str # original sentence
279
+ ```
280
+
281
+ **Quality gate `is_valid_mcq()`:**
282
+ - No two options can be too similar (catches "WWE" vs "World Wrestling Entertainment")
283
+ - Answer must appear exactly once in options
284
+ - Maximum 1 generic placeholder option allowed
285
+ - Answer must not appear in question text
286
+
287
+ ### `src/evaluator.py`
288
+ Checks answers and computes scores.
289
+
290
+ **Returns:**
291
+ ```python
292
+ {
293
+ "score" : 7,
294
+ "total" : 10,
295
+ "percentage": 70.0,
296
+ "feedback" : "Good effort! Review the explanations...",
297
+ "results" : [ {per-question breakdown} ]
298
+ }
299
+ ```
300
+
301
+ ### `app/main.py`
302
+ Streamlit app with 3 screens managed via `st.session_state`:
303
+ - **Screen 1 (input):** Text area + question count slider + Generate button
304
+ - **Screen 2 (quiz):** One question at a time, radio buttons, Previous/Next/Submit
305
+ - **Screen 3 (results):** Score banner + per-question feedback with explanations
306
+
307
+ ### `app/components.py`
308
+ Reusable display functions:
309
+ - `render_question_card()` β€” A/B/C/D labelled radio buttons
310
+ - `render_result_card()` β€” green (correct) / red (wrong) with explanation
311
+ - `render_score_summary()` β€” score banner + metric cards
312
+
313
+ ---
314
+
315
+ ## Tech Stack
316
+
317
+ | Library | Version | Purpose |
318
+ |---|---|---|
319
+ | `spaCy` | 3.7.4 | Tokenization, NER, POS tagging, sentence splitting |
320
+ | `transformers` | 4.38.2 | T5 model for question generation |
321
+ | `torch` | 2.2.1 | PyTorch backend for transformers |
322
+ | `nltk` | 3.8.1 | WordNet access for distractor generation |
323
+ | `scikit-learn` | 1.4.1.post1 | TF-IDF vectorizer |
324
+ | `sentencepiece` | latest | T5's subword tokenizer |
325
+ | `streamlit` | 1.33.0 | Web UI framework |
326
+ | `gensim` | 4.3.2 | Word2Vec / GloVe loading (optional) |
327
+ | `numpy` | 1.26.4 | TF-IDF matrix operations |
328
+
329
+ **Pre-trained model used:**
330
+ - `valhalla/t5-small-qg-hl` β€” T5-small fine-tuned on SQuAD 1.0 for answer-aware question generation using highlight format. Hosted on HuggingFace Hub, downloaded automatically on first run (~240MB).
331
+
332
+ ---
333
+
334
+ ## Setup & Installation
335
+
336
+ ### Prerequisites
337
+ - Python 3.11+
338
+ - pip
339
+ - Internet connection (first run downloads the T5 model)
340
+
341
+ ### Step 1 β€” Clone the repository
342
+ ```bash
343
+ git clone https://github.com/tanmmayyy/mcq-generator.git
344
+ cd mcq-generator
345
+ ```
346
+
347
+ ### Step 2 β€” Create a virtual environment
348
+ ```bash
349
+ python -m venv myenv
350
+
351
+ # Windows
352
+ myenv\Scripts\activate
353
+
354
+ # Mac/Linux
355
+ source myenv/bin/activate
356
+ ```
357
+
358
+ ### Step 3 β€” Install dependencies
359
+ ```bash
360
+ pip install -r requirements.txt
361
+ pip install sentencepiece # required for T5 tokenizer
362
+ ```
363
+
364
+ ### Step 4 β€” Download spaCy language model
365
+ ```bash
366
+ # If the default command fails:
367
+ pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl
368
+ ```
369
+
370
+ ### Step 5 β€” Verify installation
371
+ ```bash
372
+ python -c "import spacy; nlp = spacy.load('en_core_web_sm'); print('spaCy OK')"
373
+ python -c "from transformers import pipeline; print('Transformers OK')"
374
+ ```
375
+
376
+ ---
377
+
378
+ ## Running the App
379
+
380
+ ```bash
381
+ streamlit run app/main.py
382
+ ```
383
+
384
+ The app opens at `http://localhost:8501`. On first launch, the T5 model downloads (~240MB) and loads into memory β€” this takes 1–2 minutes. Subsequent launches are fast.
385
+
386
+ ---
387
+
388
+ ## Testing Each Module
389
+
390
+ Run these in order to verify each step of the pipeline works independently:
391
+
392
+ ```bash
393
+ # Step 1 β€” Test preprocessing (NER, TF-IDF, sentence ranking)
394
+ python src/preprocessor.py
395
+
396
+ # Step 2 β€” Test question generation (T5 model)
397
+ python src/question_generator.py
398
+
399
+ # Step 3 β€” Test distractor generation (WordNet + NER)
400
+ python src/distractor_generator.py
401
+
402
+ # Step 4 β€” Test full pipeline end-to-end
403
+ python src/mcq_builder.py
404
+
405
+ # Step 5 β€” Test scoring
406
+ python src/evaluator.py
407
+ ```
408
+
409
+ ---
410
+
411
+ ## Sample Output
412
+
413
+ **Input passage (ISRO):**
414
+ ```
415
+ The Indian Space Research Organisation (ISRO) was founded in 1969 by Vikram Sarabhai.
416
+ ISRO developed India's first satellite, Aryabhata, which was launched in 1975.
417
+ The Chandrayaan-1 mission in 2008 discovered water molecules on the Moon.
418
+ In 2023, Chandrayaan-3 successfully landed near the lunar south pole.
419
+ The Mars Orbiter Mission, also called Mangalyaan, was launched in 2013.
420
+ ```
421
+
422
+ **Generated questions:**
423
+
424
+ ```
425
+ Q1: Who founded ISRO?
426
+ A. Jawaharlal Nehru
427
+ B. APJ Abdul Kalam
428
+ C. Vikram Sarabhai βœ“
429
+ D. Homi Bhabha
430
+
431
+ Q2: What was India's first satellite called?
432
+ A. Chandrayaan
433
+ B. Mangalyaan
434
+ C. Rohini
435
+ D. Aryabhata βœ“
436
+
437
+ Q3: When did the Chandrayaan-1 mission take place?
438
+ A. 1975
439
+ B. 2013
440
+ C. 2023
441
+ D. 2008 βœ“
442
+
443
+ Q4: What mission made India the first Asian country to reach Mars orbit?
444
+ A. Chandrayaan-3
445
+ B. Aryabhata
446
+ C. Mangalyaan βœ“
447
+ D. Chandrayaan-1
448
+ ```
449
+
450
+ ---
451
+
452
+ ## What Makes a Good Passage
453
+
454
+ The system performs best on **factual passages** that contain:
455
+
456
+ | Works well | Works poorly |
457
+ |---|---|
458
+ | People names (PERSON entities) | Opinion / descriptive text |
459
+ | Specific dates (DATE entities) | Passages with repeated entities |
460
+ | Organisation names (ORG entities) | Very short passages (< 5 sentences) |
461
+ | Place names (GPE entities) | Abstract/philosophical text |
462
+ | One clear fact per sentence | Sentences with multiple facts |
463
+
464
+ **Best passage types:** History, science, geography, biographies, Wikipedia-style articles
465
+
466
+ **Avoid:** Opinion pieces, marketing content, descriptive narratives without specific facts
467
+
468
+ ---
469
+
470
+ ## Known Limitations
471
+
472
+ 1. **Passage type dependency** β€” Works best on factual text. Descriptive or opinion text produces poor questions because there are no named entities to use as answers.
473
+
474
+ 2. **T5-small quality ceiling** β€” The model used (`t5-small`) has 60M parameters. Larger models like `t5-base` or `t5-large` would produce better questions but require more memory and time.
475
+
476
+ 3. **Distractor diversity** β€” When a passage has few named entities, distractors may fall back to generic options. Fine-tuning a separate T5 model on the RACE dataset for distractor generation would fix this.
477
+
478
+ 4. **English only** β€” The current pipeline only supports English text. Extending to Hindi or other Indic languages would require multilingual spaCy models and a multilingual QG model.
479
+
480
+ 5. **No semantic deduplication** β€” Two questions from the same passage can sometimes be semantically similar even if worded differently.
481
+
482
+ ---
483
+
484
+ ## Future Work
485
+
486
+ - [ ] Fine-tune a T5 distractor generation model on the RACE dataset (100k exam questions)
487
+ - [ ] Add support for Hindi using IndicNLP + multilingual BERT
488
+ - [ ] Add PDF upload support so users can quiz themselves on any document
489
+ - [ ] BLEU/METEOR/ROUGE automated evaluation of generated questions
490
+ - [ ] Difficulty scoring per question based on distractor plausibility
491
+ - [ ] Export quiz as PDF for offline use
492
+
493
+ ---
494
+
495
+ ## Related Research
496
+
497
+ Papers that use similar approaches β€” cited for comparison:
498
+
499
+ 1. **Automatic Generation of Multiple-Choice Questions (2023)**
500
+ Zhang et al. β€” T5 with pre/postprocessing pipelines for MCQ generation
501
+ https://arxiv.org/abs/2303.14576
502
+
503
+ 2. **Deep Learning and Linguistic Feature Based Automatic MCQ Generation (Springer, ICDCIT 2022)**
504
+ Agarwal et al. β€” DL + linguistic features for MCQ generation (same 3-step pipeline)
505
+ https://link.springer.com/chapter/10.1007/978-3-030-94876-4_18
506
+
507
+ 3. **End-to-End MCQ Generation Using T5 (ScienceDirect 2022)**
508
+ Rodriguez-Torrealba et al. β€” Full T5-based pipeline with Wikipedia passages
509
+ https://www.sciencedirect.com/science/article/pii/S0957417422014014
510
+
511
+ 4. **Leaf β€” MCQ Generation System (ECIR 2022)**
512
+ Vachev et al. β€” Two fine-tuned T5 models: one for QG, one for DG on RACE
513
+ https://github.com/KristiyanVachev/Leaf-Question-Generation
514
+
515
+ 5. **Automatic Distractor Generation β€” Systematic Review (PMC 2024)**
516
+ Comprehensive review of distractor generation methods including WordNet and T5
517
+ https://pmc.ncbi.nlm.nih.gov/articles/PMC11623049/
518
+
519
+ 6. **Automatic Question Generation: A Review (Springer/PMC 2023)**
520
+ Mulla & Gharpure β€” Survey of methodologies, datasets, and evaluation metrics
521
+ https://pmc.ncbi.nlm.nih.gov/articles/PMC9886210/
522
+
523
+ **What differentiates this project from the above:**
524
+ - End-to-end pipeline with interactive quiz UI (most papers only generate questions)
525
+ - NER-type-matching distractor strategy (distractors always same entity type as answer)
526
+ - Multi-layer quality filtering at both question and MCQ level
527
+ - Answer circularity detection (rejects questions where answer appears in the question)
528
+
529
+ ---
530
+
531
+ ## Course Outcomes Covered
532
+
533
+ | CO | Description | How this project covers it |
534
+ |---|---|---|
535
+ | CO1 | Articulate NLP and word representation | TF-IDF, NER, WordNet, word embeddings all implemented and explained |
536
+ | CO2 | Build deep learning models for NLP problems | T5 transformer for QG (seq2seq), beam search decoding, transfer learning |
537
+ | CO3 | Implement ML/DL solutions in real context | End-to-end deployable system with Streamlit UI and interactive demo |
538
+
539
+ ---
540
+
541
+ ## Author
542
+
543
+ **[Tanmay Jain]**
544
+ [ Bennett University]
545
+
546
+
547
+ ---
548
+
549
+ *Built with spaCy, HuggingFace Transformers, NLTK, scikit-learn, and Streamlit.*
app/main.py CHANGED
@@ -1,36 +1,31 @@
1
- # ─────────────────────────────────────────────
2
- # app/main.py
3
- # Streamlit UI β€” the full interactive quiz app.
4
- #
5
- # Run with: streamlit run app/main.py
6
- #
7
- # Three screens:
8
- # 1. INPUT β†’ user pastes a passage, picks # of questions
9
- # 2. QUIZ β†’ one question at a time with radio buttons
10
- # 3. RESULTS β†’ score + per-question feedback
11
- # ─────────────────────────────────────────────
12
-
13
  import streamlit as st
14
  import sys, os
15
 
16
- # Make sure we can import from project root
17
  sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
18
 
19
  from config import APP_TITLE, APP_ICON, MAX_QUESTIONS
20
- from src.mcq_builder import build_quiz
21
- from src.evaluator import score_quiz
22
- from app.components import render_question_card, render_result_card, render_score_summary
23
-
24
-
25
- # ─────────────────────────────────────────────
26
- # PAGE CONFIG β€” must be first Streamlit call
27
- # ─────────────────────────────────────────────
28
 
 
 
 
 
 
 
29
 
 
 
30
 
31
- #cache
 
 
 
 
 
32
 
33
- import streamlit as st
 
 
34
 
35
  @st.cache_resource
36
  def load_pipeline():
@@ -40,214 +35,176 @@ def load_pipeline():
40
  build_quiz = load_pipeline()
41
 
42
 
43
- st.set_page_config(
44
- page_title = APP_TITLE,
45
- page_icon = APP_ICON,
46
- layout = "centered",
47
- )
48
-
49
-
50
  # ─────────────────────────────────────────────
51
- # SESSION STATE INITIALISATION
52
- # st.session_state persists values across reruns.
53
- # Think of it as the app's memory.
54
  # ─────────────────────────────────────────────
55
 
56
  def init_state():
57
  defaults = {
58
- "screen" : "input", # "input" | "quiz" | "results"
59
- "mcqs" : [], # list of MCQ objects
60
- "current_q" : 0, # index of current question
61
- "user_answers" : [], # user's selected option indices
62
- "quiz_result" : None, # scored result dict
63
  }
64
- for key, val in defaults.items():
65
- if key not in st.session_state:
66
- st.session_state[key] = val
67
 
68
  init_state()
69
 
70
 
71
  # ─────────────────────────────────────────────
72
- # HELPER: reset to start a new quiz
73
  # ─────────────────────────────────────────────
74
 
75
  def reset():
76
- st.session_state.screen = "input"
77
- st.session_state.mcqs = []
78
- st.session_state.current_q = 0
79
  st.session_state.user_answers = []
80
- st.session_state.quiz_result = None
81
 
82
 
83
  # ─────────────────────────────────────────────
84
  # SCREEN 1: INPUT
85
- # User pastes a passage and hits "Generate Quiz"
86
  # ─────────────────────────────────────────────
87
 
88
  def screen_input():
89
  st.title(f"{APP_ICON} {APP_TITLE}")
90
- st.write("Paste any text passage below to automatically generate a quiz from it.")
91
-
92
- st.info(
93
- "**For best results**, use factual passages containing: "
94
- "**people names, places, dates, organisations, or events.** \n"
95
- "Try: history, science, geography, biographies. \n"
96
- "Avoid opinion or purely descriptive text β€” they lack named facts."
97
- )
98
-
99
- st.markdown("---")
100
 
101
  passage = st.text_area(
102
- label = "Your passage",
103
- placeholder = "Paste a paragraph or article here...",
104
- height = 250,
105
- help = "Minimum ~5 sentences recommended for best results.",
106
  )
107
 
108
  num_questions = st.slider(
109
- label = "Number of questions",
110
- min_value = 3,
111
- max_value = MAX_QUESTIONS,
112
- value = 5,
113
- step = 1,
114
  )
115
 
116
- st.markdown("---")
117
 
118
- if st.button("Generate Quiz", type="primary", use_container_width=True):
119
  if not passage or len(passage.split()) < 30:
120
- st.warning("Please paste a longer passage (at least ~30 words).")
121
  return
122
 
123
- with st.spinner("Generating questions... this may take 30–60 seconds on first run."):
 
124
  try:
125
  mcqs = build_quiz(passage, num_questions=num_questions)
126
  except Exception as e:
127
- st.error(f"Something went wrong: {e}")
128
  return
129
 
130
  if not mcqs:
131
- st.error("Could not generate questions from this passage. Try a different text.")
132
  return
133
 
134
- # Store in session and move to quiz screen
135
- st.session_state.mcqs = mcqs
136
- st.session_state.user_answers = [-1] * len(mcqs) # -1 = unanswered
137
- st.session_state.current_q = 0
138
- st.session_state.screen = "quiz"
139
  st.rerun()
140
 
141
 
142
  # ─────────────────────────────────────────────
143
  # SCREEN 2: QUIZ
144
- # One question at a time, with navigation.
145
  # ─────────────────────────────────────────────
146
 
147
  def screen_quiz():
148
- mcqs = st.session_state.mcqs
149
- current = st.session_state.current_q
150
- total = len(mcqs)
151
- mcq = mcqs[current]
152
 
153
- # Progress bar
154
- st.progress((current) / total, text=f"Question {current+1} of {total}")
155
- st.markdown("---")
156
 
157
- # Render the question card (defined in components.py)
158
- selected_label = render_question_card(mcq, current)
159
 
160
- st.markdown("---")
161
 
162
  col1, col2, col3 = st.columns([1, 2, 1])
163
 
164
- # Previous button
165
  with col1:
166
  if current > 0:
167
- if st.button("← Previous"):
168
  st.session_state.current_q -= 1
169
  st.rerun()
170
 
171
- # Next / Submit button
172
  with col3:
173
- # Convert selected label (A/B/C/D) back to index
174
  if selected_label:
175
- selected_index = ord(selected_label) - ord("A")
176
- st.session_state.user_answers[current] = selected_index
177
 
178
  if current < total - 1:
179
  if st.button("Next β†’", type="primary"):
180
  if selected_label is None:
181
- st.warning("Please select an answer before continuing.")
182
  else:
183
  st.session_state.current_q += 1
184
  st.rerun()
185
  else:
186
- # Last question β€” show Submit button
187
- if st.button("Submit Quiz", type="primary"):
188
- if selected_label is None:
189
- st.warning("Please select an answer before submitting.")
190
- else:
191
- # Score the quiz
192
- result = score_quiz(
193
- st.session_state.mcqs,
194
- st.session_state.user_answers
195
- )
196
- st.session_state.quiz_result = result
197
- st.session_state.screen = "results"
198
- st.rerun()
199
 
200
- # Show quit option
201
  with col2:
202
- if st.button("Quit Quiz", help="Return to the input screen"):
203
  reset()
204
  st.rerun()
205
 
206
 
207
  # ─────────────────────────────────────────────
208
  # SCREEN 3: RESULTS
209
- # Score summary + per-question breakdown
210
  # ─────────────────────────────────────────────
211
 
212
  def screen_results():
213
  result = st.session_state.quiz_result
214
 
215
- st.title("Quiz Complete!")
216
- st.markdown("---")
217
 
218
- # Score summary banner
219
  render_score_summary(result)
220
 
221
- st.markdown("---")
222
- st.subheader("Question-by-question breakdown")
223
-
224
- # Per-question result cards
225
  for i, r in enumerate(result["results"]):
226
  render_result_card(r, i + 1)
227
 
228
- st.markdown("---")
229
-
230
  col1, col2 = st.columns(2)
 
231
  with col1:
232
- if st.button("Try Another Passage", use_container_width=True):
233
  reset()
234
  st.rerun()
 
235
  with col2:
236
- if st.button("Retake Same Quiz", type="primary", use_container_width=True):
237
- # Reset answers but keep the same MCQs
238
  st.session_state.user_answers = [-1] * len(st.session_state.mcqs)
239
- st.session_state.current_q = 0
240
- st.session_state.screen = "quiz"
241
  st.rerun()
242
 
243
 
244
  # ─────────────────────────────────────────────
245
- # ROUTER β€” picks which screen to show
246
  # ─────────────────────────────────────────────
247
 
248
  if st.session_state.screen == "input":
249
  screen_input()
 
250
  elif st.session_state.screen == "quiz":
251
  screen_quiz()
 
252
  elif st.session_state.screen == "results":
253
  screen_results()
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  import streamlit as st
2
  import sys, os
3
 
4
+ # βœ… FIX: Add project root first
5
  sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
6
 
7
  from config import APP_TITLE, APP_ICON, MAX_QUESTIONS
 
 
 
 
 
 
 
 
8
 
9
+ # βœ… FIRST Streamlit call
10
+ st.set_page_config(
11
+ page_title=APP_TITLE,
12
+ page_icon=APP_ICON,
13
+ layout="centered",
14
+ )
15
 
16
+ # Add project root to path
17
+ sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
18
 
19
+ from src.evaluator import score_quiz
20
+ from app.components import (
21
+ render_question_card,
22
+ render_result_card,
23
+ render_score_summary
24
+ )
25
 
26
+ # ─────────────────────────────────────────────
27
+ # CACHE MODEL (important for performance)
28
+ # ─────────────────────────────────────────────
29
 
30
  @st.cache_resource
31
  def load_pipeline():
 
35
  build_quiz = load_pipeline()
36
 
37
 
 
 
 
 
 
 
 
38
  # ─────────────────────────────────────────────
39
+ # SESSION STATE
 
 
40
  # ─────────────────────────────────────────────
41
 
42
  def init_state():
43
  defaults = {
44
+ "screen": "input",
45
+ "mcqs": [],
46
+ "current_q": 0,
47
+ "user_answers": [],
48
+ "quiz_result": None,
49
  }
50
+ for k, v in defaults.items():
51
+ if k not in st.session_state:
52
+ st.session_state[k] = v
53
 
54
  init_state()
55
 
56
 
57
  # ─────────────────────────────────────────────
58
+ # RESET
59
  # ─────────────────────────────────────────────
60
 
61
  def reset():
62
+ st.session_state.screen = "input"
63
+ st.session_state.mcqs = []
64
+ st.session_state.current_q = 0
65
  st.session_state.user_answers = []
66
+ st.session_state.quiz_result = None
67
 
68
 
69
  # ─────────────────────────────────────────────
70
  # SCREEN 1: INPUT
 
71
  # ─────────────────────────────────────────────
72
 
73
  def screen_input():
74
  st.title(f"{APP_ICON} {APP_TITLE}")
75
+ st.write("Paste text to generate MCQs")
 
 
 
 
 
 
 
 
 
76
 
77
  passage = st.text_area(
78
+ "Your passage",
79
+ height=250,
80
+ placeholder="Paste content here..."
 
81
  )
82
 
83
  num_questions = st.slider(
84
+ "Number of questions",
85
+ 3,
86
+ MAX_QUESTIONS,
87
+ 5
 
88
  )
89
 
90
+ if st.button("Generate Quiz", type="primary"):
91
 
 
92
  if not passage or len(passage.split()) < 30:
93
+ st.warning("Enter at least 30 words")
94
  return
95
 
96
+ with st.spinner("Generating questions..."):
97
+
98
  try:
99
  mcqs = build_quiz(passage, num_questions=num_questions)
100
  except Exception as e:
101
+ st.error(f"Error: {e}")
102
  return
103
 
104
  if not mcqs:
105
+ st.error("Failed to generate questions")
106
  return
107
 
108
+ st.session_state.mcqs = mcqs
109
+ st.session_state.user_answers = [-1] * len(mcqs)
110
+ st.session_state.current_q = 0
111
+ st.session_state.screen = "quiz"
112
+
113
  st.rerun()
114
 
115
 
116
  # ─────────────────────────────────────────────
117
  # SCREEN 2: QUIZ
 
118
  # ─────────────────────────────────────────────
119
 
120
  def screen_quiz():
121
+ mcqs = st.session_state.mcqs
122
+ current = st.session_state.current_q
123
+ total = len(mcqs)
 
124
 
125
+ mcq = mcqs[current]
 
 
126
 
127
+ st.progress(current / total, text=f"Q {current+1}/{total}")
 
128
 
129
+ selected_label = render_question_card(mcq, current)
130
 
131
  col1, col2, col3 = st.columns([1, 2, 1])
132
 
133
+ # Previous
134
  with col1:
135
  if current > 0:
136
+ if st.button("← Prev"):
137
  st.session_state.current_q -= 1
138
  st.rerun()
139
 
140
+ # Next / Submit
141
  with col3:
 
142
  if selected_label:
143
+ idx = ord(selected_label) - ord("A")
144
+ st.session_state.user_answers[current] = idx
145
 
146
  if current < total - 1:
147
  if st.button("Next β†’", type="primary"):
148
  if selected_label is None:
149
+ st.warning("Select an answer")
150
  else:
151
  st.session_state.current_q += 1
152
  st.rerun()
153
  else:
154
+ if st.button("Submit", type="primary"):
155
+ result = score_quiz(
156
+ st.session_state.mcqs,
157
+ st.session_state.user_answers
158
+ )
159
+ st.session_state.quiz_result = result
160
+ st.session_state.screen = "results"
161
+ st.rerun()
 
 
 
 
 
162
 
163
+ # Quit
164
  with col2:
165
+ if st.button("Quit"):
166
  reset()
167
  st.rerun()
168
 
169
 
170
  # ─────────────────────────────────────────────
171
  # SCREEN 3: RESULTS
 
172
  # ─────────────────────────────────────────────
173
 
174
  def screen_results():
175
  result = st.session_state.quiz_result
176
 
177
+ st.title("Quiz Complete")
 
178
 
 
179
  render_score_summary(result)
180
 
 
 
 
 
181
  for i, r in enumerate(result["results"]):
182
  render_result_card(r, i + 1)
183
 
 
 
184
  col1, col2 = st.columns(2)
185
+
186
  with col1:
187
+ if st.button("New Quiz"):
188
  reset()
189
  st.rerun()
190
+
191
  with col2:
192
+ if st.button("Retry"):
 
193
  st.session_state.user_answers = [-1] * len(st.session_state.mcqs)
194
+ st.session_state.current_q = 0
195
+ st.session_state.screen = "quiz"
196
  st.rerun()
197
 
198
 
199
  # ─────────────────────────────────────────────
200
+ # ROUTER
201
  # ─────────────────────────────────────────────
202
 
203
  if st.session_state.screen == "input":
204
  screen_input()
205
+
206
  elif st.session_state.screen == "quiz":
207
  screen_quiz()
208
+
209
  elif st.session_state.screen == "results":
210
  screen_results()
finetune_t5_file.ipynb CHANGED
The diff for this file is too large to render. See raw diff
 
requirements.txt CHANGED
Binary files a/requirements.txt and b/requirements.txt differ
 
runtime.txt DELETED
@@ -1 +0,0 @@
1
- python-3.11
 
 
setup.sh DELETED
@@ -1 +0,0 @@
1
- python -m spacy download en_core_web_sm
 
 
src/question_generator.py CHANGED
@@ -10,6 +10,18 @@ from transformers import pipeline
10
  import re
11
  import sys, os
12
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
14
  from config import QG_MODEL_NAME, MAX_QUESTIONS
15
 
 
10
  import re
11
  import sys, os
12
 
13
+ import streamlit as st
14
+
15
+ @st.cache_resource
16
+ def load_model():
17
+ tokenizer = AutoTokenizer.from_pretrained("valhalla/t5-small-qg-hl", use_fast=False)
18
+ model = T5ForConditionalGeneration.from_pretrained("valhalla/t5-small-qg-hl")
19
+ model.eval()
20
+ return tokenizer, model
21
+
22
+ tokenizer, qg_model = load_model()
23
+
24
+
25
  sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
26
  from config import QG_MODEL_NAME, MAX_QUESTIONS
27