Spaces:
Running
A newer version of the Streamlit SDK is available: 1.57.0
title: MCQ Generator
emoji: π
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.33.0
app_file: app/main.py
pinned: false
π MCQ Generator β Automatic Multiple Choice Question Generator
An end-to-end NLP pipeline that reads any text passage and automatically generates a complete multiple-choice quiz with scoring and explanations.
Built as a course project for an NLP curriculum covering Modules IβIV: tokenization, word embeddings, transformers, and natural language generation.
π Table of Contents
- What This Project Does
- Live Demo
- How It Works β The Full Pipeline
- NLP Techniques Used
- Project Structure
- Each File Explained
- Tech Stack
- Setup & Installation
- Running the App
- Testing Each Module
- Sample Output
- What Makes a Good Passage
- Known Limitations
- Future Work
- Related Research
- Course Outcomes Covered
What This Project Does
Given any factual text passage, this system:
- Extracts the most important sentences using TF-IDF ranking
- Identifies answer candidates using Named Entity Recognition (NER)
- Generates natural language questions using a T5 transformer model
- Creates plausible wrong options (distractors) using WordNet and NER
- Presents an interactive quiz with scoring and per-question explanations
Example:
Input passage:
Albert Einstein was born on March 14, 1879, in Ulm, Germany.
He was awarded the Nobel Prize in Physics in 1921 for his
discovery of the photoelectric effect.
Generated MCQ:
Q: Where was Albert Einstein born?
A. France
B. Germany β
C. United States
D. Switzerland
Live Demo
streamlit run app/main.py
Opens at http://localhost:8501 in your browser.
How It Works β The Full Pipeline
Raw Text Passage
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 1: PREPROCESSING (preprocessor.py) β
β β
β β’ Split into sentences (spaCy) β
β β’ Rank by TF-IDF score (scikit-learn) β
β β’ Extract Named Entities (spaCy NER) β
β β’ Filter answer candidates (blacklist) β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β top sentences + answer candidates
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 2: QUESTION GENERATION β
β (question_generator.py) β
β β
β β’ Highlight answer in sentence with <hl> β
β β’ Feed to T5 transformer model β
β β’ Generate 3 candidate questions β
β β’ Validate: reject circular/vague Qs β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β (question, answer) pairs
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 3: DISTRACTOR GENERATION β
β (distractor_generator.py) β
β β
β Strategy 1: Same-type NER entities β
β from the passage β
β Strategy 2: WordNet hyponym siblings β
β Strategy 3: Cross-label fallback β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β 3 wrong options per question
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 4: MCQ ASSEMBLY + VALIDATION β
β (mcq_builder.py) β
β β
β β’ Combine answer + distractors β
β β’ Shuffle options randomly β
β β’ Quality gate: dedup, similarity check β
β β’ Return list of MCQ objects β
βββββββββββββββββββ¬ββββββββββββββββββββββββββββ
β validated MCQ list
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 5: QUIZ UI + SCORING β
β (app/main.py + evaluator.py) β
β β
β β’ Streamlit 3-screen app β
β β’ Input β Quiz β Results β
β β’ Score, feedback, explanations β
βββββββββββββββββββββββββββββββββββββββββββββββ
NLP Techniques Used
Module I β Foundational NLP
| Technique | Where Used | Purpose |
|---|---|---|
| Tokenization | preprocessor.py |
Split text into sentences and tokens using spaCy |
| Lemmatization | preprocessor.py |
Normalize word forms for TF-IDF |
| Stop word removal | preprocessor.py |
Filter noise before TF-IDF scoring |
| Named Entity Recognition (NER) | preprocessor.py |
Find PERSON, ORG, DATE, GPE as answer candidates |
| POS Tagging | preprocessor.py |
Identify nouns and proper nouns |
| WordNet | distractor_generator.py |
Find semantically related words as distractors |
| Synsets / Hyponyms | distractor_generator.py |
Navigate WordNet hierarchy for same-category words |
Module II β Word Representation
| Technique | Where Used | Purpose |
|---|---|---|
| TF-IDF | preprocessor.py |
Rank sentences by information density |
| Word Embeddings (GloVe) | distractor_generator.py |
Optional cosine-similarity based distractor finding |
TF-IDF explained:
- TF (Term Frequency) = how often a word appears in this sentence
- IDF (Inverse Document Frequency) = how rare the word is across all sentences
- High TF-IDF score = sentence contains rare, informative words β good question source
Module III β Deep Learning for NLP
| Technique | Where Used | Purpose |
|---|---|---|
| Transformers | question_generator.py |
T5 model for question generation |
| Transfer Learning | question_generator.py |
Using pre-trained T5 fine-tuned on SQuAD |
| Seq2Seq | question_generator.py |
Encoder-decoder architecture of T5 |
| Beam Search | question_generator.py |
Generate multiple question candidates, pick best |
Module IV β Advanced NLP
| Technique | Where Used | Purpose |
|---|---|---|
| T5 (Text-to-Text Transfer Transformer) | question_generator.py |
State-of-the-art QG model |
| Natural Language Generation (NLG) | question_generator.py |
Generating grammatical questions |
| Subword Tokenization (SentencePiece) | question_generator.py |
T5's tokenizer handles rare/unknown words |
| Pre-trained Models | question_generator.py |
valhalla/t5-small-qg-hl from HuggingFace |
Project Structure
mcq_generator/
β
βββ src/ # Core NLP pipeline modules
β βββ __init__.py
β βββ preprocessor.py # Text cleaning, TF-IDF, NER, answer extraction
β βββ question_generator.py # T5-based question generation
β βββ distractor_generator.py # WordNet + NER distractor generation
β βββ mcq_builder.py # Pipeline orchestrator + MCQ dataclass
β βββ evaluator.py # Answer checking and scoring
β
βββ app/ # Streamlit web application
β βββ __init__.py
β βββ main.py # 3-screen app: input β quiz β results
β βββ components.py # Reusable UI components
β
βββ data/
β βββ sample_passages.json # 5 test passages (ISRO, Gandhi, AI, etc.)
β
βββ models/ # (gitignored) Downloaded model files
β βββ README.md
β
βββ notebooks/ # Jupyter notebooks for exploration
β
βββ config.py # All settings in one place
βββ requirements.txt # Python dependencies
βββ README.md # This file
Each File Explained
config.py
Central settings file. Every other module imports from here.
- Model name, number of questions, sentence count, file paths
- Change values here to tune the entire system without touching logic files
src/preprocessor.py
The NLP foundation of the project.
Key functions:
extract_sentences(text)β spaCy sentence boundary detectionrank_sentences(sentences)β TF-IDF scoring, returns top N most informative sentencesextract_answer_candidates(sentence)β NER-based extraction with strict quality filterspreprocess(text)β full pipeline, returns structured dict
Design decisions:
- Only
PERSON,ORG,GPE,DATE,EVENT,WORK_OF_ARTNER labels are accepted as answers - A
BLACKLISTof 30+ generic words ("annual", "various", "Moon") prevents trivial answers - Answers are sorted by priority: PERSON > ORG/GPE > DATE > others
src/question_generator.py
Uses the valhalla/t5-small-qg-hl model β a T5-small fine-tuned on SQuAD for question generation.
How T5 QG works:
Input: "generate question: ISRO was founded in <hl> 1969 <hl> by Vikram Sarabhai."
Output: "In what year was ISRO founded?"
Key functions:
highlight_answer(sentence, answer)β wraps answer in<hl>tagsgenerate_question(sentence, answer)β beam search with 5 beams, 3 candidatesanswer_is_addressable(question, answer)β rejects circular, vague, or short questions
Quality filters applied:
- Must start with a question word (what/who/when/where/which/how)
- Answer must NOT appear in the question
- Abbreviation trap detection (e.g. rejects Q: "What does ISRO stand for?" when A is the full name)
- Minimum 5 words
src/distractor_generator.py
Generates 3 plausible wrong answer options. Uses a priority-based strategy chain.
Strategy 1 β Same-label NER (best): Finds other entities of the same NER type from the passage.
Answer: "1969" (DATE) β Distractors: ["1975", "2008", "2023"] (other DATEs in passage)
Answer: "Vikram Sarabhai" (PERSON) β Distractors: ["Kalam", "Dhawan", "Nehru"]
Strategy 2 β WordNet hyponyms: Navigates the WordNet hierarchy to find sibling words in the same semantic category.
Answer: "India" β hypernym: "country" β hyponyms: ["China", "Brazil", "Pakistan"]
Strategy 3 β Cross-label fallback: Uses any other named entity from the passage if strategies 1 and 2 fail.
src/mcq_builder.py
The single entry point that the UI calls. Orchestrates the entire pipeline.
MCQ dataclass:
@dataclass
class MCQ:
question : str
options : list # 4 shuffled options
correct_index : int # index of correct answer (0-3)
correct_answer : str
explanation : str # original sentence
Quality gate is_valid_mcq():
- No two options can be too similar (catches "WWE" vs "World Wrestling Entertainment")
- Answer must appear exactly once in options
- Maximum 1 generic placeholder option allowed
- Answer must not appear in question text
src/evaluator.py
Checks answers and computes scores.
Returns:
{
"score" : 7,
"total" : 10,
"percentage": 70.0,
"feedback" : "Good effort! Review the explanations...",
"results" : [ {per-question breakdown} ]
}
app/main.py
Streamlit app with 3 screens managed via st.session_state:
- Screen 1 (input): Text area + question count slider + Generate button
- Screen 2 (quiz): One question at a time, radio buttons, Previous/Next/Submit
- Screen 3 (results): Score banner + per-question feedback with explanations
app/components.py
Reusable display functions:
render_question_card()β A/B/C/D labelled radio buttonsrender_result_card()β green (correct) / red (wrong) with explanationrender_score_summary()β score banner + metric cards
Tech Stack
| Library | Version | Purpose |
|---|---|---|
spaCy |
3.7.4 | Tokenization, NER, POS tagging, sentence splitting |
transformers |
4.38.2 | T5 model for question generation |
torch |
2.2.1 | PyTorch backend for transformers |
nltk |
3.8.1 | WordNet access for distractor generation |
scikit-learn |
1.4.1.post1 | TF-IDF vectorizer |
sentencepiece |
latest | T5's subword tokenizer |
streamlit |
1.33.0 | Web UI framework |
gensim |
4.3.2 | Word2Vec / GloVe loading (optional) |
numpy |
1.26.4 | TF-IDF matrix operations |
Pre-trained model used:
valhalla/t5-small-qg-hlβ T5-small fine-tuned on SQuAD 1.0 for answer-aware question generation using highlight format. Hosted on HuggingFace Hub, downloaded automatically on first run (~240MB).
Setup & Installation
Prerequisites
- Python 3.11+
- pip
- Internet connection (first run downloads the T5 model)
Step 1 β Clone the repository
git clone https://github.com/tanmmayyy/mcq-generator.git
cd mcq-generator
Step 2 β Create a virtual environment
python -m venv myenv
# Windows
myenv\Scripts\activate
# Mac/Linux
source myenv/bin/activate
Step 3 β Install dependencies
pip install -r requirements.txt
pip install sentencepiece # required for T5 tokenizer
Step 4 β Download spaCy language model
# If the default command fails:
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl
Step 5 β Verify installation
python -c "import spacy; nlp = spacy.load('en_core_web_sm'); print('spaCy OK')"
python -c "from transformers import pipeline; print('Transformers OK')"
Running the App
streamlit run app/main.py
The app opens at http://localhost:8501. On first launch, the T5 model downloads (~240MB) and loads into memory β this takes 1β2 minutes. Subsequent launches are fast.
Testing Each Module
Run these in order to verify each step of the pipeline works independently:
# Step 1 β Test preprocessing (NER, TF-IDF, sentence ranking)
python src/preprocessor.py
# Step 2 β Test question generation (T5 model)
python src/question_generator.py
# Step 3 β Test distractor generation (WordNet + NER)
python src/distractor_generator.py
# Step 4 β Test full pipeline end-to-end
python src/mcq_builder.py
# Step 5 β Test scoring
python src/evaluator.py
Sample Output
Input passage (ISRO):
The Indian Space Research Organisation (ISRO) was founded in 1969 by Vikram Sarabhai.
ISRO developed India's first satellite, Aryabhata, which was launched in 1975.
The Chandrayaan-1 mission in 2008 discovered water molecules on the Moon.
In 2023, Chandrayaan-3 successfully landed near the lunar south pole.
The Mars Orbiter Mission, also called Mangalyaan, was launched in 2013.
Generated questions:
Q1: Who founded ISRO?
A. Jawaharlal Nehru
B. APJ Abdul Kalam
C. Vikram Sarabhai β
D. Homi Bhabha
Q2: What was India's first satellite called?
A. Chandrayaan
B. Mangalyaan
C. Rohini
D. Aryabhata β
Q3: When did the Chandrayaan-1 mission take place?
A. 1975
B. 2013
C. 2023
D. 2008 β
Q4: What mission made India the first Asian country to reach Mars orbit?
A. Chandrayaan-3
B. Aryabhata
C. Mangalyaan β
D. Chandrayaan-1
What Makes a Good Passage
The system performs best on factual passages that contain:
| Works well | Works poorly |
|---|---|
| People names (PERSON entities) | Opinion / descriptive text |
| Specific dates (DATE entities) | Passages with repeated entities |
| Organisation names (ORG entities) | Very short passages (< 5 sentences) |
| Place names (GPE entities) | Abstract/philosophical text |
| One clear fact per sentence | Sentences with multiple facts |
Best passage types: History, science, geography, biographies, Wikipedia-style articles
Avoid: Opinion pieces, marketing content, descriptive narratives without specific facts
Known Limitations
Passage type dependency β Works best on factual text. Descriptive or opinion text produces poor questions because there are no named entities to use as answers.
T5-small quality ceiling β The model used (
t5-small) has 60M parameters. Larger models liket5-baseort5-largewould produce better questions but require more memory and time.Distractor diversity β When a passage has few named entities, distractors may fall back to generic options. Fine-tuning a separate T5 model on the RACE dataset for distractor generation would fix this.
English only β The current pipeline only supports English text. Extending to Hindi or other Indic languages would require multilingual spaCy models and a multilingual QG model.
No semantic deduplication β Two questions from the same passage can sometimes be semantically similar even if worded differently.
Future Work
- Fine-tune a T5 distractor generation model on the RACE dataset (100k exam questions)
- Add support for Hindi using IndicNLP + multilingual BERT
- Add PDF upload support so users can quiz themselves on any document
- BLEU/METEOR/ROUGE automated evaluation of generated questions
- Difficulty scoring per question based on distractor plausibility
- Export quiz as PDF for offline use
Related Research
Papers that use similar approaches β cited for comparison:
Automatic Generation of Multiple-Choice Questions (2023) Zhang et al. β T5 with pre/postprocessing pipelines for MCQ generation https://arxiv.org/abs/2303.14576
Deep Learning and Linguistic Feature Based Automatic MCQ Generation (Springer, ICDCIT 2022) Agarwal et al. β DL + linguistic features for MCQ generation (same 3-step pipeline) https://link.springer.com/chapter/10.1007/978-3-030-94876-4_18
End-to-End MCQ Generation Using T5 (ScienceDirect 2022) Rodriguez-Torrealba et al. β Full T5-based pipeline with Wikipedia passages https://www.sciencedirect.com/science/article/pii/S0957417422014014
Leaf β MCQ Generation System (ECIR 2022) Vachev et al. β Two fine-tuned T5 models: one for QG, one for DG on RACE https://github.com/KristiyanVachev/Leaf-Question-Generation
Automatic Distractor Generation β Systematic Review (PMC 2024) Comprehensive review of distractor generation methods including WordNet and T5 https://pmc.ncbi.nlm.nih.gov/articles/PMC11623049/
Automatic Question Generation: A Review (Springer/PMC 2023) Mulla & Gharpure β Survey of methodologies, datasets, and evaluation metrics https://pmc.ncbi.nlm.nih.gov/articles/PMC9886210/
What differentiates this project from the above:
- End-to-end pipeline with interactive quiz UI (most papers only generate questions)
- NER-type-matching distractor strategy (distractors always same entity type as answer)
- Multi-layer quality filtering at both question and MCQ level
- Answer circularity detection (rejects questions where answer appears in the question)
Course Outcomes Covered
| CO | Description | How this project covers it |
|---|---|---|
| CO1 | Articulate NLP and word representation | TF-IDF, NER, WordNet, word embeddings all implemented and explained |
| CO2 | Build deep learning models for NLP problems | T5 transformer for QG (seq2seq), beam search decoding, transfer learning |
| CO3 | Implement ML/DL solutions in real context | End-to-end deployable system with Streamlit UI and interactive demo |
Author
[Tanmay Jain] [ Bennett University]
Built with spaCy, HuggingFace Transformers, NLTK, scikit-learn, and Streamlit.