Spaces:

tanmmayyy
/

mcq_generator

Running

App Files Files Community

mcq_generator / README.md

tanmmayyy

fix huggingface yaml

3eee1f2 11 days ago

preview code

raw

history blame contribute delete

20.9 kB

	---
	title: MCQ Generator
	emoji: 📝
	colorFrom: blue
	colorTo: purple
	sdk: streamlit
	sdk_version: 1.33.0
	app_file: app/main.py
	pinned: false
	---

	# 📝 MCQ Generator — Automatic Multiple Choice Question Generator

	> An end-to-end NLP pipeline that reads any text passage and automatically generates a complete multiple-choice quiz with scoring and explanations.

	Built as a course project for an NLP curriculum covering Modules I–IV: tokenization, word embeddings, transformers, and natural language generation.

	---

	## 📌 Table of Contents

	1. [What This Project Does](#what-this-project-does)
	2. [Live Demo](#live-demo)
	3. [How It Works — The Full Pipeline](#how-it-works--the-full-pipeline)
	4. [NLP Techniques Used](#nlp-techniques-used)
	5. [Project Structure](#project-structure)
	6. [Each File Explained](#each-file-explained)
	7. [Tech Stack](#tech-stack)
	8. [Setup & Installation](#setup--installation)
	9. [Running the App](#running-the-app)
	10. [Testing Each Module](#testing-each-module)
	11. [Sample Output](#sample-output)
	12. [What Makes a Good Passage](#what-makes-a-good-passage)
	13. [Known Limitations](#known-limitations)
	14. [Future Work](#future-work)
	15. [Related Research](#related-research)
	16. [Course Outcomes Covered](#course-outcomes-covered)

	---

	## What This Project Does

	Given any factual text passage, this system:

	1. Extracts the most important sentences using TF-IDF ranking
	2. Identifies answer candidates using Named Entity Recognition (NER)
	3. Generates natural language questions using a T5 transformer model
	4. Creates plausible wrong options (distractors) using WordNet and NER
	5. Presents an interactive quiz with scoring and per-question explanations

	Example:

	Input passage:
	```
	Albert Einstein was born on March 14, 1879, in Ulm, Germany.
	He was awarded the Nobel Prize in Physics in 1921 for his
	discovery of the photoelectric effect.
	```

	Generated MCQ:
	```
	Q: Where was Albert Einstein born?

	A. France
	B. Germany ✓
	C. United States
	D. Switzerland
	```

	---

	## Live Demo

	```bash
	streamlit run app/main.py
	```

	Opens at `http://localhost:8501` in your browser.

	---

	## How It Works — The Full Pipeline

	```
	Raw Text Passage
	│
	▼
	┌─────────────────────────────────────────────┐
	│ STEP 1: PREPROCESSING (preprocessor.py) │
	│ │
	│ • Split into sentences (spaCy) │
	│ • Rank by TF-IDF score (scikit-learn) │
	│ • Extract Named Entities (spaCy NER) │
	│ • Filter answer candidates (blacklist) │
	└─────────────────┬───────────────────────────┘
	│ top sentences + answer candidates
	▼
	┌─────────────────────────────────────────────┐
	│ STEP 2: QUESTION GENERATION │
	│ (question_generator.py) │
	│ │
	│ • Highlight answer in sentence with <hl> │
	│ • Feed to T5 transformer model │
	│ • Generate 3 candidate questions │
	│ • Validate: reject circular/vague Qs │
	└─────────────────┬───────────────────────────┘
	│ (question, answer) pairs
	▼
	┌─────────────────────────────────────────────┐
	│ STEP 3: DISTRACTOR GENERATION │
	│ (distractor_generator.py) │
	│ │
	│ Strategy 1: Same-type NER entities │
	│ from the passage │
	│ Strategy 2: WordNet hyponym siblings │
	│ Strategy 3: Cross-label fallback │
	└─────────────────┬───────────────────────────┘
	│ 3 wrong options per question
	▼
	┌─────────────────────────────────────────────┐
	│ STEP 4: MCQ ASSEMBLY + VALIDATION │
	│ (mcq_builder.py) │
	│ │
	│ • Combine answer + distractors │
	│ • Shuffle options randomly │
	│ • Quality gate: dedup, similarity check │
	│ • Return list of MCQ objects │
	└─────────────────┬───────────────────────────┘
	│ validated MCQ list
	▼
	┌─────────────────────────────────────────────┐
	│ STEP 5: QUIZ UI + SCORING │
	│ (app/main.py + evaluator.py) │
	│ │
	│ • Streamlit 3-screen app │
	│ • Input → Quiz → Results │
	│ • Score, feedback, explanations │
	└─────────────────────────────────────────────┘
	```

	---

	## NLP Techniques Used

	### Module I — Foundational NLP
	\| Technique \| Where Used \| Purpose \|
	\|---\|---\|---\|
	\| Tokenization \| `preprocessor.py` \| Split text into sentences and tokens using spaCy \|
	\| Lemmatization \| `preprocessor.py` \| Normalize word forms for TF-IDF \|
	\| Stop word removal \| `preprocessor.py` \| Filter noise before TF-IDF scoring \|
	\| Named Entity Recognition (NER) \| `preprocessor.py` \| Find PERSON, ORG, DATE, GPE as answer candidates \|
	\| POS Tagging \| `preprocessor.py` \| Identify nouns and proper nouns \|
	\| WordNet \| `distractor_generator.py` \| Find semantically related words as distractors \|
	\| Synsets / Hyponyms \| `distractor_generator.py` \| Navigate WordNet hierarchy for same-category words \|

	### Module II — Word Representation
	\| Technique \| Where Used \| Purpose \|
	\|---\|---\|---\|
	\| TF-IDF \| `preprocessor.py` \| Rank sentences by information density \|
	\| Word Embeddings (GloVe) \| `distractor_generator.py` \| Optional cosine-similarity based distractor finding \|

	TF-IDF explained:
	- TF (Term Frequency) = how often a word appears in this sentence
	- IDF (Inverse Document Frequency) = how rare the word is across all sentences
	- High TF-IDF score = sentence contains rare, informative words → good question source

	### Module III — Deep Learning for NLP
	\| Technique \| Where Used \| Purpose \|
	\|---\|---\|---\|
	\| Transformers \| `question_generator.py` \| T5 model for question generation \|
	\| Transfer Learning \| `question_generator.py` \| Using pre-trained T5 fine-tuned on SQuAD \|
	\| Seq2Seq \| `question_generator.py` \| Encoder-decoder architecture of T5 \|
	\| Beam Search \| `question_generator.py` \| Generate multiple question candidates, pick best \|

	### Module IV — Advanced NLP
	\| Technique \| Where Used \| Purpose \|
	\|---\|---\|---\|
	\| T5 (Text-to-Text Transfer Transformer) \| `question_generator.py` \| State-of-the-art QG model \|
	\| Natural Language Generation (NLG) \| `question_generator.py` \| Generating grammatical questions \|
	\| Subword Tokenization (SentencePiece) \| `question_generator.py` \| T5's tokenizer handles rare/unknown words \|
	\| Pre-trained Models \| `question_generator.py` \| `valhalla/t5-small-qg-hl` from HuggingFace \|

	---

	## Project Structure

	```
	mcq_generator/
	│
	├── src/ # Core NLP pipeline modules
	│ ├── __init__.py
	│ ├── preprocessor.py # Text cleaning, TF-IDF, NER, answer extraction
	│ ├── question_generator.py # T5-based question generation
	│ ├── distractor_generator.py # WordNet + NER distractor generation
	│ ├── mcq_builder.py # Pipeline orchestrator + MCQ dataclass
	│ └── evaluator.py # Answer checking and scoring
	│
	├── app/ # Streamlit web application
	│ ├── __init__.py
	│ ├── main.py # 3-screen app: input → quiz → results
	│ └── components.py # Reusable UI components
	│
	├── data/
	│ └── sample_passages.json # 5 test passages (ISRO, Gandhi, AI, etc.)
	│
	├── models/ # (gitignored) Downloaded model files
	│ └── README.md
	│
	├── notebooks/ # Jupyter notebooks for exploration
	│
	├── config.py # All settings in one place
	├── requirements.txt # Python dependencies
	└── README.md # This file
	```

	---

	## Each File Explained

	### `config.py`
	Central settings file. Every other module imports from here.
	- Model name, number of questions, sentence count, file paths
	- Change values here to tune the entire system without touching logic files

	### `src/preprocessor.py`
	The NLP foundation of the project.

	Key functions:
	- `extract_sentences(text)` — spaCy sentence boundary detection
	- `rank_sentences(sentences)` — TF-IDF scoring, returns top N most informative sentences
	- `extract_answer_candidates(sentence)` — NER-based extraction with strict quality filters
	- `preprocess(text)` — full pipeline, returns structured dict

	Design decisions:
	- Only `PERSON`, `ORG`, `GPE`, `DATE`, `EVENT`, `WORK_OF_ART` NER labels are accepted as answers
	- A `BLACKLIST` of 30+ generic words ("annual", "various", "Moon") prevents trivial answers
	- Answers are sorted by priority: PERSON > ORG/GPE > DATE > others

	### `src/question_generator.py`
	Uses the `valhalla/t5-small-qg-hl` model — a T5-small fine-tuned on SQuAD for question generation.

	How T5 QG works:
	```
	Input: "generate question: ISRO was founded in <hl> 1969 <hl> by Vikram Sarabhai."
	Output: "In what year was ISRO founded?"
	```

	Key functions:
	- `highlight_answer(sentence, answer)` — wraps answer in `<hl>` tags
	- `generate_question(sentence, answer)` — beam search with 5 beams, 3 candidates
	- `answer_is_addressable(question, answer)` — rejects circular, vague, or short questions

	Quality filters applied:
	- Must start with a question word (what/who/when/where/which/how)
	- Answer must NOT appear in the question
	- Abbreviation trap detection (e.g. rejects Q: "What does ISRO stand for?" when A is the full name)
	- Minimum 5 words

	### `src/distractor_generator.py`
	Generates 3 plausible wrong answer options. Uses a priority-based strategy chain.

	Strategy 1 — Same-label NER (best):
	Finds other entities of the same NER type from the passage.
	```
	Answer: "1969" (DATE) → Distractors: ["1975", "2008", "2023"] (other DATEs in passage)
	Answer: "Vikram Sarabhai" (PERSON) → Distractors: ["Kalam", "Dhawan", "Nehru"]
	```

	Strategy 2 — WordNet hyponyms:
	Navigates the WordNet hierarchy to find sibling words in the same semantic category.
	```
	Answer: "India" → hypernym: "country" → hyponyms: ["China", "Brazil", "Pakistan"]
	```

	Strategy 3 — Cross-label fallback:
	Uses any other named entity from the passage if strategies 1 and 2 fail.

	### `src/mcq_builder.py`
	The single entry point that the UI calls. Orchestrates the entire pipeline.

	MCQ dataclass:
	```python
	@dataclass
	class MCQ:
	question : str
	options : list # 4 shuffled options
	correct_index : int # index of correct answer (0-3)
	correct_answer : str
	explanation : str # original sentence
	```

	Quality gate `is_valid_mcq()`:
	- No two options can be too similar (catches "WWE" vs "World Wrestling Entertainment")
	- Answer must appear exactly once in options
	- Maximum 1 generic placeholder option allowed
	- Answer must not appear in question text

	### `src/evaluator.py`
	Checks answers and computes scores.

	Returns:
	```python
	{
	"score" : 7,
	"total" : 10,
	"percentage": 70.0,
	"feedback" : "Good effort! Review the explanations...",
	"results" : [ {per-question breakdown} ]
	}
	```

	### `app/main.py`
	Streamlit app with 3 screens managed via `st.session_state`:
	- Screen 1 (input): Text area + question count slider + Generate button
	- Screen 2 (quiz): One question at a time, radio buttons, Previous/Next/Submit
	- Screen 3 (results): Score banner + per-question feedback with explanations

	### `app/components.py`
	Reusable display functions:
	- `render_question_card()` — A/B/C/D labelled radio buttons
	- `render_result_card()` — green (correct) / red (wrong) with explanation
	- `render_score_summary()` — score banner + metric cards

	---

	## Tech Stack

	\| Library \| Version \| Purpose \|
	\|---\|---\|---\|
	\| `spaCy` \| 3.7.4 \| Tokenization, NER, POS tagging, sentence splitting \|
	\| `transformers` \| 4.38.2 \| T5 model for question generation \|
	\| `torch` \| 2.2.1 \| PyTorch backend for transformers \|
	\| `nltk` \| 3.8.1 \| WordNet access for distractor generation \|
	\| `scikit-learn` \| 1.4.1.post1 \| TF-IDF vectorizer \|
	\| `sentencepiece` \| latest \| T5's subword tokenizer \|
	\| `streamlit` \| 1.33.0 \| Web UI framework \|
	\| `gensim` \| 4.3.2 \| Word2Vec / GloVe loading (optional) \|
	\| `numpy` \| 1.26.4 \| TF-IDF matrix operations \|

	Pre-trained model used:
	- `valhalla/t5-small-qg-hl` — T5-small fine-tuned on SQuAD 1.0 for answer-aware question generation using highlight format. Hosted on HuggingFace Hub, downloaded automatically on first run (~240MB).

	---

	## Setup & Installation

	### Prerequisites
	- Python 3.11+
	- pip
	- Internet connection (first run downloads the T5 model)

	### Step 1 — Clone the repository
	```bash
	git clone https://github.com/tanmmayyy/mcq-generator.git
	cd mcq-generator
	```

	### Step 2 — Create a virtual environment
	```bash
	python -m venv myenv

	# Windows
	myenv\Scripts\activate

	# Mac/Linux
	source myenv/bin/activate
	```

	### Step 3 — Install dependencies
	```bash
	pip install -r requirements.txt
	pip install sentencepiece # required for T5 tokenizer
	```

	### Step 4 — Download spaCy language model
	```bash
	# If the default command fails:
	pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl
	```

	### Step 5 — Verify installation
	```bash
	python -c "import spacy; nlp = spacy.load('en_core_web_sm'); print('spaCy OK')"
	python -c "from transformers import pipeline; print('Transformers OK')"
	```

	---

	## Running the App

	```bash
	streamlit run app/main.py
	```

	The app opens at `http://localhost:8501`. On first launch, the T5 model downloads (~240MB) and loads into memory — this takes 1–2 minutes. Subsequent launches are fast.

	---

	## Testing Each Module

	Run these in order to verify each step of the pipeline works independently:

	```bash
	# Step 1 — Test preprocessing (NER, TF-IDF, sentence ranking)
	python src/preprocessor.py

	# Step 2 — Test question generation (T5 model)
	python src/question_generator.py

	# Step 3 — Test distractor generation (WordNet + NER)
	python src/distractor_generator.py

	# Step 4 — Test full pipeline end-to-end
	python src/mcq_builder.py

	# Step 5 — Test scoring
	python src/evaluator.py
	```

	---

	## Sample Output

	Input passage (ISRO):
	```
	The Indian Space Research Organisation (ISRO) was founded in 1969 by Vikram Sarabhai.
	ISRO developed India's first satellite, Aryabhata, which was launched in 1975.
	The Chandrayaan-1 mission in 2008 discovered water molecules on the Moon.
	In 2023, Chandrayaan-3 successfully landed near the lunar south pole.
	The Mars Orbiter Mission, also called Mangalyaan, was launched in 2013.
	```

	Generated questions:

	```
	Q1: Who founded ISRO?
	A. Jawaharlal Nehru
	B. APJ Abdul Kalam
	C. Vikram Sarabhai ✓
	D. Homi Bhabha

	Q2: What was India's first satellite called?
	A. Chandrayaan
	B. Mangalyaan
	C. Rohini
	D. Aryabhata ✓

	Q3: When did the Chandrayaan-1 mission take place?
	A. 1975
	B. 2013
	C. 2023
	D. 2008 ✓

	Q4: What mission made India the first Asian country to reach Mars orbit?
	A. Chandrayaan-3
	B. Aryabhata
	C. Mangalyaan ✓
	D. Chandrayaan-1
	```

	---

	## What Makes a Good Passage

	The system performs best on factual passages that contain:

	\| Works well \| Works poorly \|
	\|---\|---\|
	\| People names (PERSON entities) \| Opinion / descriptive text \|
	\| Specific dates (DATE entities) \| Passages with repeated entities \|
	\| Organisation names (ORG entities) \| Very short passages (< 5 sentences) \|
	\| Place names (GPE entities) \| Abstract/philosophical text \|
	\| One clear fact per sentence \| Sentences with multiple facts \|

	Best passage types: History, science, geography, biographies, Wikipedia-style articles

	Avoid: Opinion pieces, marketing content, descriptive narratives without specific facts

	---

	## Known Limitations

	1. Passage type dependency — Works best on factual text. Descriptive or opinion text produces poor questions because there are no named entities to use as answers.

	2. T5-small quality ceiling — The model used (`t5-small`) has 60M parameters. Larger models like `t5-base` or `t5-large` would produce better questions but require more memory and time.

	3. Distractor diversity — When a passage has few named entities, distractors may fall back to generic options. Fine-tuning a separate T5 model on the RACE dataset for distractor generation would fix this.

	4. English only — The current pipeline only supports English text. Extending to Hindi or other Indic languages would require multilingual spaCy models and a multilingual QG model.

	5. No semantic deduplication — Two questions from the same passage can sometimes be semantically similar even if worded differently.

	---

	## Future Work

	- [ ] Fine-tune a T5 distractor generation model on the RACE dataset (100k exam questions)
	- [ ] Add support for Hindi using IndicNLP + multilingual BERT
	- [ ] Add PDF upload support so users can quiz themselves on any document
	- [ ] BLEU/METEOR/ROUGE automated evaluation of generated questions
	- [ ] Difficulty scoring per question based on distractor plausibility
	- [ ] Export quiz as PDF for offline use

	---

	## Related Research

	Papers that use similar approaches — cited for comparison:

	1. Automatic Generation of Multiple-Choice Questions (2023)
	Zhang et al. — T5 with pre/postprocessing pipelines for MCQ generation
	https://arxiv.org/abs/2303.14576

	2. Deep Learning and Linguistic Feature Based Automatic MCQ Generation (Springer, ICDCIT 2022)
	Agarwal et al. — DL + linguistic features for MCQ generation (same 3-step pipeline)
	https://link.springer.com/chapter/10.1007/978-3-030-94876-4_18

	3. End-to-End MCQ Generation Using T5 (ScienceDirect 2022)
	Rodriguez-Torrealba et al. — Full T5-based pipeline with Wikipedia passages
	https://www.sciencedirect.com/science/article/pii/S0957417422014014

	4. Leaf — MCQ Generation System (ECIR 2022)
	Vachev et al. — Two fine-tuned T5 models: one for QG, one for DG on RACE
	https://github.com/KristiyanVachev/Leaf-Question-Generation

	5. Automatic Distractor Generation — Systematic Review (PMC 2024)
	Comprehensive review of distractor generation methods including WordNet and T5
	https://pmc.ncbi.nlm.nih.gov/articles/PMC11623049/

	6. Automatic Question Generation: A Review (Springer/PMC 2023)
	Mulla & Gharpure — Survey of methodologies, datasets, and evaluation metrics
	https://pmc.ncbi.nlm.nih.gov/articles/PMC9886210/

	What differentiates this project from the above:
	- End-to-end pipeline with interactive quiz UI (most papers only generate questions)
	- NER-type-matching distractor strategy (distractors always same entity type as answer)
	- Multi-layer quality filtering at both question and MCQ level
	- Answer circularity detection (rejects questions where answer appears in the question)

	---

	## Course Outcomes Covered

	\| CO \| Description \| How this project covers it \|
	\|---\|---\|---\|
	\| CO1 \| Articulate NLP and word representation \| TF-IDF, NER, WordNet, word embeddings all implemented and explained \|
	\| CO2 \| Build deep learning models for NLP problems \| T5 transformer for QG (seq2seq), beam search decoding, transfer learning \|
	\| CO3 \| Implement ML/DL solutions in real context \| End-to-end deployable system with Streamlit UI and interactive demo \|

	---

	## Author

	[Tanmay Jain]
	[ Bennett University]


	---

	Built with spaCy, HuggingFace Transformers, NLTK, scikit-learn, and Streamlit.