mcq_generator / README.md
tanmmayyy's picture
fix huggingface yaml
3eee1f2
---
title: MCQ Generator
emoji: πŸ“
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.33.0
app_file: app/main.py
pinned: false
---
# πŸ“ MCQ Generator β€” Automatic Multiple Choice Question Generator
> **An end-to-end NLP pipeline that reads any text passage and automatically generates a complete multiple-choice quiz with scoring and explanations.**
Built as a course project for an NLP curriculum covering Modules I–IV: tokenization, word embeddings, transformers, and natural language generation.
---
## πŸ“Œ Table of Contents
1. [What This Project Does](#what-this-project-does)
2. [Live Demo](#live-demo)
3. [How It Works β€” The Full Pipeline](#how-it-works--the-full-pipeline)
4. [NLP Techniques Used](#nlp-techniques-used)
5. [Project Structure](#project-structure)
6. [Each File Explained](#each-file-explained)
7. [Tech Stack](#tech-stack)
8. [Setup & Installation](#setup--installation)
9. [Running the App](#running-the-app)
10. [Testing Each Module](#testing-each-module)
11. [Sample Output](#sample-output)
12. [What Makes a Good Passage](#what-makes-a-good-passage)
13. [Known Limitations](#known-limitations)
14. [Future Work](#future-work)
15. [Related Research](#related-research)
16. [Course Outcomes Covered](#course-outcomes-covered)
---
## What This Project Does
Given any factual text passage, this system:
1. **Extracts** the most important sentences using TF-IDF ranking
2. **Identifies** answer candidates using Named Entity Recognition (NER)
3. **Generates** natural language questions using a T5 transformer model
4. **Creates** plausible wrong options (distractors) using WordNet and NER
5. **Presents** an interactive quiz with scoring and per-question explanations
**Example:**
Input passage:
```
Albert Einstein was born on March 14, 1879, in Ulm, Germany.
He was awarded the Nobel Prize in Physics in 1921 for his
discovery of the photoelectric effect.
```
Generated MCQ:
```
Q: Where was Albert Einstein born?
A. France
B. Germany βœ“
C. United States
D. Switzerland
```
---
## Live Demo
```bash
streamlit run app/main.py
```
Opens at `http://localhost:8501` in your browser.
---
## How It Works β€” The Full Pipeline
```
Raw Text Passage
β”‚
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ STEP 1: PREPROCESSING (preprocessor.py) β”‚
β”‚ β”‚
β”‚ β€’ Split into sentences (spaCy) β”‚
β”‚ β€’ Rank by TF-IDF score (scikit-learn) β”‚
β”‚ β€’ Extract Named Entities (spaCy NER) β”‚
β”‚ β€’ Filter answer candidates (blacklist) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ top sentences + answer candidates
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ STEP 2: QUESTION GENERATION β”‚
β”‚ (question_generator.py) β”‚
β”‚ β”‚
β”‚ β€’ Highlight answer in sentence with <hl> β”‚
β”‚ β€’ Feed to T5 transformer model β”‚
β”‚ β€’ Generate 3 candidate questions β”‚
β”‚ β€’ Validate: reject circular/vague Qs β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ (question, answer) pairs
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ STEP 3: DISTRACTOR GENERATION β”‚
β”‚ (distractor_generator.py) β”‚
β”‚ β”‚
β”‚ Strategy 1: Same-type NER entities β”‚
β”‚ from the passage β”‚
β”‚ Strategy 2: WordNet hyponym siblings β”‚
β”‚ Strategy 3: Cross-label fallback β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ 3 wrong options per question
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ STEP 4: MCQ ASSEMBLY + VALIDATION β”‚
β”‚ (mcq_builder.py) β”‚
β”‚ β”‚
β”‚ β€’ Combine answer + distractors β”‚
β”‚ β€’ Shuffle options randomly β”‚
β”‚ β€’ Quality gate: dedup, similarity check β”‚
β”‚ β€’ Return list of MCQ objects β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ validated MCQ list
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ STEP 5: QUIZ UI + SCORING β”‚
β”‚ (app/main.py + evaluator.py) β”‚
β”‚ β”‚
β”‚ β€’ Streamlit 3-screen app β”‚
β”‚ β€’ Input β†’ Quiz β†’ Results β”‚
β”‚ β€’ Score, feedback, explanations β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
---
## NLP Techniques Used
### Module I β€” Foundational NLP
| Technique | Where Used | Purpose |
|---|---|---|
| Tokenization | `preprocessor.py` | Split text into sentences and tokens using spaCy |
| Lemmatization | `preprocessor.py` | Normalize word forms for TF-IDF |
| Stop word removal | `preprocessor.py` | Filter noise before TF-IDF scoring |
| Named Entity Recognition (NER) | `preprocessor.py` | Find PERSON, ORG, DATE, GPE as answer candidates |
| POS Tagging | `preprocessor.py` | Identify nouns and proper nouns |
| WordNet | `distractor_generator.py` | Find semantically related words as distractors |
| Synsets / Hyponyms | `distractor_generator.py` | Navigate WordNet hierarchy for same-category words |
### Module II β€” Word Representation
| Technique | Where Used | Purpose |
|---|---|---|
| TF-IDF | `preprocessor.py` | Rank sentences by information density |
| Word Embeddings (GloVe) | `distractor_generator.py` | Optional cosine-similarity based distractor finding |
**TF-IDF explained:**
- **TF (Term Frequency)** = how often a word appears in *this* sentence
- **IDF (Inverse Document Frequency)** = how rare the word is across *all* sentences
- High TF-IDF score = sentence contains rare, informative words β†’ good question source
### Module III β€” Deep Learning for NLP
| Technique | Where Used | Purpose |
|---|---|---|
| Transformers | `question_generator.py` | T5 model for question generation |
| Transfer Learning | `question_generator.py` | Using pre-trained T5 fine-tuned on SQuAD |
| Seq2Seq | `question_generator.py` | Encoder-decoder architecture of T5 |
| Beam Search | `question_generator.py` | Generate multiple question candidates, pick best |
### Module IV β€” Advanced NLP
| Technique | Where Used | Purpose |
|---|---|---|
| T5 (Text-to-Text Transfer Transformer) | `question_generator.py` | State-of-the-art QG model |
| Natural Language Generation (NLG) | `question_generator.py` | Generating grammatical questions |
| Subword Tokenization (SentencePiece) | `question_generator.py` | T5's tokenizer handles rare/unknown words |
| Pre-trained Models | `question_generator.py` | `valhalla/t5-small-qg-hl` from HuggingFace |
---
## Project Structure
```
mcq_generator/
β”‚
β”œβ”€β”€ src/ # Core NLP pipeline modules
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ preprocessor.py # Text cleaning, TF-IDF, NER, answer extraction
β”‚ β”œβ”€β”€ question_generator.py # T5-based question generation
β”‚ β”œβ”€β”€ distractor_generator.py # WordNet + NER distractor generation
β”‚ β”œβ”€β”€ mcq_builder.py # Pipeline orchestrator + MCQ dataclass
β”‚ └── evaluator.py # Answer checking and scoring
β”‚
β”œβ”€β”€ app/ # Streamlit web application
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ main.py # 3-screen app: input β†’ quiz β†’ results
β”‚ └── components.py # Reusable UI components
β”‚
β”œβ”€β”€ data/
β”‚ └── sample_passages.json # 5 test passages (ISRO, Gandhi, AI, etc.)
β”‚
β”œβ”€β”€ models/ # (gitignored) Downloaded model files
β”‚ └── README.md
β”‚
β”œβ”€β”€ notebooks/ # Jupyter notebooks for exploration
β”‚
β”œβ”€β”€ config.py # All settings in one place
β”œβ”€β”€ requirements.txt # Python dependencies
└── README.md # This file
```
---
## Each File Explained
### `config.py`
Central settings file. Every other module imports from here.
- Model name, number of questions, sentence count, file paths
- Change values here to tune the entire system without touching logic files
### `src/preprocessor.py`
The NLP foundation of the project.
**Key functions:**
- `extract_sentences(text)` β€” spaCy sentence boundary detection
- `rank_sentences(sentences)` β€” TF-IDF scoring, returns top N most informative sentences
- `extract_answer_candidates(sentence)` β€” NER-based extraction with strict quality filters
- `preprocess(text)` β€” full pipeline, returns structured dict
**Design decisions:**
- Only `PERSON`, `ORG`, `GPE`, `DATE`, `EVENT`, `WORK_OF_ART` NER labels are accepted as answers
- A `BLACKLIST` of 30+ generic words ("annual", "various", "Moon") prevents trivial answers
- Answers are sorted by priority: PERSON > ORG/GPE > DATE > others
### `src/question_generator.py`
Uses the `valhalla/t5-small-qg-hl` model β€” a T5-small fine-tuned on SQuAD for question generation.
**How T5 QG works:**
```
Input: "generate question: ISRO was founded in <hl> 1969 <hl> by Vikram Sarabhai."
Output: "In what year was ISRO founded?"
```
**Key functions:**
- `highlight_answer(sentence, answer)` β€” wraps answer in `<hl>` tags
- `generate_question(sentence, answer)` β€” beam search with 5 beams, 3 candidates
- `answer_is_addressable(question, answer)` β€” rejects circular, vague, or short questions
**Quality filters applied:**
- Must start with a question word (what/who/when/where/which/how)
- Answer must NOT appear in the question
- Abbreviation trap detection (e.g. rejects Q: "What does ISRO stand for?" when A is the full name)
- Minimum 5 words
### `src/distractor_generator.py`
Generates 3 plausible wrong answer options. Uses a priority-based strategy chain.
**Strategy 1 β€” Same-label NER (best):**
Finds other entities of the same NER type from the passage.
```
Answer: "1969" (DATE) β†’ Distractors: ["1975", "2008", "2023"] (other DATEs in passage)
Answer: "Vikram Sarabhai" (PERSON) β†’ Distractors: ["Kalam", "Dhawan", "Nehru"]
```
**Strategy 2 β€” WordNet hyponyms:**
Navigates the WordNet hierarchy to find sibling words in the same semantic category.
```
Answer: "India" β†’ hypernym: "country" β†’ hyponyms: ["China", "Brazil", "Pakistan"]
```
**Strategy 3 β€” Cross-label fallback:**
Uses any other named entity from the passage if strategies 1 and 2 fail.
### `src/mcq_builder.py`
The single entry point that the UI calls. Orchestrates the entire pipeline.
**MCQ dataclass:**
```python
@dataclass
class MCQ:
question : str
options : list # 4 shuffled options
correct_index : int # index of correct answer (0-3)
correct_answer : str
explanation : str # original sentence
```
**Quality gate `is_valid_mcq()`:**
- No two options can be too similar (catches "WWE" vs "World Wrestling Entertainment")
- Answer must appear exactly once in options
- Maximum 1 generic placeholder option allowed
- Answer must not appear in question text
### `src/evaluator.py`
Checks answers and computes scores.
**Returns:**
```python
{
"score" : 7,
"total" : 10,
"percentage": 70.0,
"feedback" : "Good effort! Review the explanations...",
"results" : [ {per-question breakdown} ]
}
```
### `app/main.py`
Streamlit app with 3 screens managed via `st.session_state`:
- **Screen 1 (input):** Text area + question count slider + Generate button
- **Screen 2 (quiz):** One question at a time, radio buttons, Previous/Next/Submit
- **Screen 3 (results):** Score banner + per-question feedback with explanations
### `app/components.py`
Reusable display functions:
- `render_question_card()` β€” A/B/C/D labelled radio buttons
- `render_result_card()` β€” green (correct) / red (wrong) with explanation
- `render_score_summary()` β€” score banner + metric cards
---
## Tech Stack
| Library | Version | Purpose |
|---|---|---|
| `spaCy` | 3.7.4 | Tokenization, NER, POS tagging, sentence splitting |
| `transformers` | 4.38.2 | T5 model for question generation |
| `torch` | 2.2.1 | PyTorch backend for transformers |
| `nltk` | 3.8.1 | WordNet access for distractor generation |
| `scikit-learn` | 1.4.1.post1 | TF-IDF vectorizer |
| `sentencepiece` | latest | T5's subword tokenizer |
| `streamlit` | 1.33.0 | Web UI framework |
| `gensim` | 4.3.2 | Word2Vec / GloVe loading (optional) |
| `numpy` | 1.26.4 | TF-IDF matrix operations |
**Pre-trained model used:**
- `valhalla/t5-small-qg-hl` β€” T5-small fine-tuned on SQuAD 1.0 for answer-aware question generation using highlight format. Hosted on HuggingFace Hub, downloaded automatically on first run (~240MB).
---
## Setup & Installation
### Prerequisites
- Python 3.11+
- pip
- Internet connection (first run downloads the T5 model)
### Step 1 β€” Clone the repository
```bash
git clone https://github.com/tanmmayyy/mcq-generator.git
cd mcq-generator
```
### Step 2 β€” Create a virtual environment
```bash
python -m venv myenv
# Windows
myenv\Scripts\activate
# Mac/Linux
source myenv/bin/activate
```
### Step 3 β€” Install dependencies
```bash
pip install -r requirements.txt
pip install sentencepiece # required for T5 tokenizer
```
### Step 4 β€” Download spaCy language model
```bash
# If the default command fails:
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl
```
### Step 5 β€” Verify installation
```bash
python -c "import spacy; nlp = spacy.load('en_core_web_sm'); print('spaCy OK')"
python -c "from transformers import pipeline; print('Transformers OK')"
```
---
## Running the App
```bash
streamlit run app/main.py
```
The app opens at `http://localhost:8501`. On first launch, the T5 model downloads (~240MB) and loads into memory β€” this takes 1–2 minutes. Subsequent launches are fast.
---
## Testing Each Module
Run these in order to verify each step of the pipeline works independently:
```bash
# Step 1 β€” Test preprocessing (NER, TF-IDF, sentence ranking)
python src/preprocessor.py
# Step 2 β€” Test question generation (T5 model)
python src/question_generator.py
# Step 3 β€” Test distractor generation (WordNet + NER)
python src/distractor_generator.py
# Step 4 β€” Test full pipeline end-to-end
python src/mcq_builder.py
# Step 5 β€” Test scoring
python src/evaluator.py
```
---
## Sample Output
**Input passage (ISRO):**
```
The Indian Space Research Organisation (ISRO) was founded in 1969 by Vikram Sarabhai.
ISRO developed India's first satellite, Aryabhata, which was launched in 1975.
The Chandrayaan-1 mission in 2008 discovered water molecules on the Moon.
In 2023, Chandrayaan-3 successfully landed near the lunar south pole.
The Mars Orbiter Mission, also called Mangalyaan, was launched in 2013.
```
**Generated questions:**
```
Q1: Who founded ISRO?
A. Jawaharlal Nehru
B. APJ Abdul Kalam
C. Vikram Sarabhai βœ“
D. Homi Bhabha
Q2: What was India's first satellite called?
A. Chandrayaan
B. Mangalyaan
C. Rohini
D. Aryabhata βœ“
Q3: When did the Chandrayaan-1 mission take place?
A. 1975
B. 2013
C. 2023
D. 2008 βœ“
Q4: What mission made India the first Asian country to reach Mars orbit?
A. Chandrayaan-3
B. Aryabhata
C. Mangalyaan βœ“
D. Chandrayaan-1
```
---
## What Makes a Good Passage
The system performs best on **factual passages** that contain:
| Works well | Works poorly |
|---|---|
| People names (PERSON entities) | Opinion / descriptive text |
| Specific dates (DATE entities) | Passages with repeated entities |
| Organisation names (ORG entities) | Very short passages (< 5 sentences) |
| Place names (GPE entities) | Abstract/philosophical text |
| One clear fact per sentence | Sentences with multiple facts |
**Best passage types:** History, science, geography, biographies, Wikipedia-style articles
**Avoid:** Opinion pieces, marketing content, descriptive narratives without specific facts
---
## Known Limitations
1. **Passage type dependency** β€” Works best on factual text. Descriptive or opinion text produces poor questions because there are no named entities to use as answers.
2. **T5-small quality ceiling** β€” The model used (`t5-small`) has 60M parameters. Larger models like `t5-base` or `t5-large` would produce better questions but require more memory and time.
3. **Distractor diversity** β€” When a passage has few named entities, distractors may fall back to generic options. Fine-tuning a separate T5 model on the RACE dataset for distractor generation would fix this.
4. **English only** β€” The current pipeline only supports English text. Extending to Hindi or other Indic languages would require multilingual spaCy models and a multilingual QG model.
5. **No semantic deduplication** β€” Two questions from the same passage can sometimes be semantically similar even if worded differently.
---
## Future Work
- [ ] Fine-tune a T5 distractor generation model on the RACE dataset (100k exam questions)
- [ ] Add support for Hindi using IndicNLP + multilingual BERT
- [ ] Add PDF upload support so users can quiz themselves on any document
- [ ] BLEU/METEOR/ROUGE automated evaluation of generated questions
- [ ] Difficulty scoring per question based on distractor plausibility
- [ ] Export quiz as PDF for offline use
---
## Related Research
Papers that use similar approaches β€” cited for comparison:
1. **Automatic Generation of Multiple-Choice Questions (2023)**
Zhang et al. β€” T5 with pre/postprocessing pipelines for MCQ generation
https://arxiv.org/abs/2303.14576
2. **Deep Learning and Linguistic Feature Based Automatic MCQ Generation (Springer, ICDCIT 2022)**
Agarwal et al. β€” DL + linguistic features for MCQ generation (same 3-step pipeline)
https://link.springer.com/chapter/10.1007/978-3-030-94876-4_18
3. **End-to-End MCQ Generation Using T5 (ScienceDirect 2022)**
Rodriguez-Torrealba et al. β€” Full T5-based pipeline with Wikipedia passages
https://www.sciencedirect.com/science/article/pii/S0957417422014014
4. **Leaf β€” MCQ Generation System (ECIR 2022)**
Vachev et al. β€” Two fine-tuned T5 models: one for QG, one for DG on RACE
https://github.com/KristiyanVachev/Leaf-Question-Generation
5. **Automatic Distractor Generation β€” Systematic Review (PMC 2024)**
Comprehensive review of distractor generation methods including WordNet and T5
https://pmc.ncbi.nlm.nih.gov/articles/PMC11623049/
6. **Automatic Question Generation: A Review (Springer/PMC 2023)**
Mulla & Gharpure β€” Survey of methodologies, datasets, and evaluation metrics
https://pmc.ncbi.nlm.nih.gov/articles/PMC9886210/
**What differentiates this project from the above:**
- End-to-end pipeline with interactive quiz UI (most papers only generate questions)
- NER-type-matching distractor strategy (distractors always same entity type as answer)
- Multi-layer quality filtering at both question and MCQ level
- Answer circularity detection (rejects questions where answer appears in the question)
---
## Course Outcomes Covered
| CO | Description | How this project covers it |
|---|---|---|
| CO1 | Articulate NLP and word representation | TF-IDF, NER, WordNet, word embeddings all implemented and explained |
| CO2 | Build deep learning models for NLP problems | T5 transformer for QG (seq2seq), beam search decoding, transfer learning |
| CO3 | Implement ML/DL solutions in real context | End-to-end deployable system with Streamlit UI and interactive demo |
---
## Author
**[Tanmay Jain]**
[ Bennett University]
---
*Built with spaCy, HuggingFace Transformers, NLTK, scikit-learn, and Streamlit.*