Spaces:

tanmmayyy
/

mcq_generator

Running

File size: 20,900 Bytes

3eee1f2
 
 
 
 
 
 
 
 
 
 
a50befe
 
 
 
 
 
c0a212c
a50befe

---
title: MCQ Generator
emoji: 📝
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.33.0
app_file: app/main.py
pinned: false
---

# 📝 MCQ Generator — Automatic Multiple Choice Question Generator

> **An end-to-end NLP pipeline that reads any text passage and automatically generates a complete multiple-choice quiz with scoring and explanations.**

Built as a course project for an NLP curriculum covering Modules I–IV: tokenization, word embeddings, transformers, and natural language generation.

---

## 📌 Table of Contents

1. [What This Project Does](#what-this-project-does)
2. [Live Demo](#live-demo)
3. [How It Works — The Full Pipeline](#how-it-works--the-full-pipeline)
4. [NLP Techniques Used](#nlp-techniques-used)
5. [Project Structure](#project-structure)
6. [Each File Explained](#each-file-explained)
7. [Tech Stack](#tech-stack)
8. [Setup & Installation](#setup--installation)
9. [Running the App](#running-the-app)
10. [Testing Each Module](#testing-each-module)
11. [Sample Output](#sample-output)
12. [What Makes a Good Passage](#what-makes-a-good-passage)
13. [Known Limitations](#known-limitations)
14. [Future Work](#future-work)
15. [Related Research](#related-research)
16. [Course Outcomes Covered](#course-outcomes-covered)

---

## What This Project Does

Given any factual text passage, this system:

1. **Extracts** the most important sentences using TF-IDF ranking
2. **Identifies** answer candidates using Named Entity Recognition (NER)
3. **Generates** natural language questions using a T5 transformer model
4. **Creates** plausible wrong options (distractors) using WordNet and NER
5. **Presents** an interactive quiz with scoring and per-question explanations

**Example:**

Input passage:
```
Albert Einstein was born on March 14, 1879, in Ulm, Germany.
He was awarded the Nobel Prize in Physics in 1921 for his
discovery of the photoelectric effect.
```

Generated MCQ:
```
Q: Where was Albert Einstein born?

A. France
B. Germany  ✓
C. United States
D. Switzerland
```

---

## Live Demo

```bash
streamlit run app/main.py
```

Opens at `http://localhost:8501` in your browser.

---

## How It Works — The Full Pipeline

```
Raw Text Passage
       │
       ▼
┌─────────────────────────────────────────────┐
│  STEP 1: PREPROCESSING  (preprocessor.py)   │
│                                             │
│  • Split into sentences (spaCy)             │
│  • Rank by TF-IDF score (scikit-learn)      │
│  • Extract Named Entities (spaCy NER)       │
│  • Filter answer candidates (blacklist)     │
└─────────────────┬───────────────────────────┘
                  │  top sentences + answer candidates
                  ▼
┌─────────────────────────────────────────────┐
│  STEP 2: QUESTION GENERATION                │
│          (question_generator.py)            │
│                                             │
│  • Highlight answer in sentence with <hl>   │
│  • Feed to T5 transformer model             │
│  • Generate 3 candidate questions           │
│  • Validate: reject circular/vague Qs       │
└─────────────────┬───────────────────────────┘
                  │  (question, answer) pairs
                  ▼
┌─────────────────────────────────────────────┐
│  STEP 3: DISTRACTOR GENERATION              │
│          (distractor_generator.py)          │
│                                             │
│  Strategy 1: Same-type NER entities         │
│              from the passage               │
│  Strategy 2: WordNet hyponym siblings       │
│  Strategy 3: Cross-label fallback           │
└─────────────────┬───────────────────────────┘
                  │  3 wrong options per question
                  ▼
┌─────────────────────────────────────────────┐
│  STEP 4: MCQ ASSEMBLY + VALIDATION          │
│          (mcq_builder.py)                   │
│                                             │
│  • Combine answer + distractors             │
│  • Shuffle options randomly                 │
│  • Quality gate: dedup, similarity check    │
│  • Return list of MCQ objects               │
└─────────────────┬───────────────────────────┘
                  │  validated MCQ list
                  ▼
┌─────────────────────────────────────────────┐
│  STEP 5: QUIZ UI + SCORING                  │
│          (app/main.py + evaluator.py)       │
│                                             │
│  • Streamlit 3-screen app                   │
│  • Input → Quiz → Results                   │
│  • Score, feedback, explanations            │
└─────────────────────────────────────────────┘
```

---

## NLP Techniques Used

### Module I — Foundational NLP
| Technique | Where Used | Purpose |
|---|---|---|
| Tokenization | `preprocessor.py` | Split text into sentences and tokens using spaCy |
| Lemmatization | `preprocessor.py` | Normalize word forms for TF-IDF |
| Stop word removal | `preprocessor.py` | Filter noise before TF-IDF scoring |
| Named Entity Recognition (NER) | `preprocessor.py` | Find PERSON, ORG, DATE, GPE as answer candidates |
| POS Tagging | `preprocessor.py` | Identify nouns and proper nouns |
| WordNet | `distractor_generator.py` | Find semantically related words as distractors |
| Synsets / Hyponyms | `distractor_generator.py` | Navigate WordNet hierarchy for same-category words |

### Module II — Word Representation
| Technique | Where Used | Purpose |
|---|---|---|
| TF-IDF | `preprocessor.py` | Rank sentences by information density |
| Word Embeddings (GloVe) | `distractor_generator.py` | Optional cosine-similarity based distractor finding |

**TF-IDF explained:**
- **TF (Term Frequency)** = how often a word appears in *this* sentence
- **IDF (Inverse Document Frequency)** = how rare the word is across *all* sentences
- High TF-IDF score = sentence contains rare, informative words → good question source

### Module III — Deep Learning for NLP
| Technique | Where Used | Purpose |
|---|---|---|
| Transformers | `question_generator.py` | T5 model for question generation |
| Transfer Learning | `question_generator.py` | Using pre-trained T5 fine-tuned on SQuAD |
| Seq2Seq | `question_generator.py` | Encoder-decoder architecture of T5 |
| Beam Search | `question_generator.py` | Generate multiple question candidates, pick best |

### Module IV — Advanced NLP
| Technique | Where Used | Purpose |
|---|---|---|
| T5 (Text-to-Text Transfer Transformer) | `question_generator.py` | State-of-the-art QG model |
| Natural Language Generation (NLG) | `question_generator.py` | Generating grammatical questions |
| Subword Tokenization (SentencePiece) | `question_generator.py` | T5's tokenizer handles rare/unknown words |
| Pre-trained Models | `question_generator.py` | `valhalla/t5-small-qg-hl` from HuggingFace |

---

## Project Structure

```
mcq_generator/
│
├── src/                          # Core NLP pipeline modules
│   ├── __init__.py
│   ├── preprocessor.py           # Text cleaning, TF-IDF, NER, answer extraction
│   ├── question_generator.py     # T5-based question generation
│   ├── distractor_generator.py   # WordNet + NER distractor generation
│   ├── mcq_builder.py            # Pipeline orchestrator + MCQ dataclass
│   └── evaluator.py              # Answer checking and scoring
│
├── app/                          # Streamlit web application
│   ├── __init__.py
│   ├── main.py                   # 3-screen app: input → quiz → results
│   └── components.py             # Reusable UI components
│
├── data/
│   └── sample_passages.json      # 5 test passages (ISRO, Gandhi, AI, etc.)
│
├── models/                       # (gitignored) Downloaded model files
│   └── README.md
│
├── notebooks/                    # Jupyter notebooks for exploration
│
├── config.py                     # All settings in one place
├── requirements.txt              # Python dependencies
└── README.md                     # This file
```

---

## Each File Explained

### `config.py`
Central settings file. Every other module imports from here.
- Model name, number of questions, sentence count, file paths
- Change values here to tune the entire system without touching logic files

### `src/preprocessor.py`
The NLP foundation of the project.

**Key functions:**
- `extract_sentences(text)` — spaCy sentence boundary detection
- `rank_sentences(sentences)` — TF-IDF scoring, returns top N most informative sentences
- `extract_answer_candidates(sentence)` — NER-based extraction with strict quality filters
- `preprocess(text)` — full pipeline, returns structured dict

**Design decisions:**
- Only `PERSON`, `ORG`, `GPE`, `DATE`, `EVENT`, `WORK_OF_ART` NER labels are accepted as answers
- A `BLACKLIST` of 30+ generic words ("annual", "various", "Moon") prevents trivial answers
- Answers are sorted by priority: PERSON > ORG/GPE > DATE > others

### `src/question_generator.py`
Uses the `valhalla/t5-small-qg-hl` model — a T5-small fine-tuned on SQuAD for question generation.

**How T5 QG works:**
```
Input:  "generate question: ISRO was founded in <hl> 1969 <hl> by Vikram Sarabhai."
Output: "In what year was ISRO founded?"
```

**Key functions:**
- `highlight_answer(sentence, answer)` — wraps answer in `<hl>` tags
- `generate_question(sentence, answer)` — beam search with 5 beams, 3 candidates
- `answer_is_addressable(question, answer)` — rejects circular, vague, or short questions

**Quality filters applied:**
- Must start with a question word (what/who/when/where/which/how)
- Answer must NOT appear in the question
- Abbreviation trap detection (e.g. rejects Q: "What does ISRO stand for?" when A is the full name)
- Minimum 5 words

### `src/distractor_generator.py`
Generates 3 plausible wrong answer options. Uses a priority-based strategy chain.

**Strategy 1 — Same-label NER (best):**
Finds other entities of the same NER type from the passage.
```
Answer: "1969" (DATE) → Distractors: ["1975", "2008", "2023"]  (other DATEs in passage)
Answer: "Vikram Sarabhai" (PERSON) → Distractors: ["Kalam", "Dhawan", "Nehru"]
```

**Strategy 2 — WordNet hyponyms:**
Navigates the WordNet hierarchy to find sibling words in the same semantic category.
```
Answer: "India" → hypernym: "country" → hyponyms: ["China", "Brazil", "Pakistan"]
```

**Strategy 3 — Cross-label fallback:**
Uses any other named entity from the passage if strategies 1 and 2 fail.

### `src/mcq_builder.py`
The single entry point that the UI calls. Orchestrates the entire pipeline.

**MCQ dataclass:**
```python
@dataclass
class MCQ:
    question       : str
    options        : list      # 4 shuffled options
    correct_index  : int       # index of correct answer (0-3)
    correct_answer : str
    explanation    : str       # original sentence
```

**Quality gate `is_valid_mcq()`:**
- No two options can be too similar (catches "WWE" vs "World Wrestling Entertainment")
- Answer must appear exactly once in options
- Maximum 1 generic placeholder option allowed
- Answer must not appear in question text

### `src/evaluator.py`
Checks answers and computes scores.

**Returns:**
```python
{
  "score"     : 7,
  "total"     : 10,
  "percentage": 70.0,
  "feedback"  : "Good effort! Review the explanations...",
  "results"   : [ {per-question breakdown} ]
}
```

### `app/main.py`
Streamlit app with 3 screens managed via `st.session_state`:
- **Screen 1 (input):** Text area + question count slider + Generate button
- **Screen 2 (quiz):** One question at a time, radio buttons, Previous/Next/Submit
- **Screen 3 (results):** Score banner + per-question feedback with explanations

### `app/components.py`
Reusable display functions:
- `render_question_card()` — A/B/C/D labelled radio buttons
- `render_result_card()` — green (correct) / red (wrong) with explanation
- `render_score_summary()` — score banner + metric cards

---

## Tech Stack

| Library | Version | Purpose |
|---|---|---|
| `spaCy` | 3.7.4 | Tokenization, NER, POS tagging, sentence splitting |
| `transformers` | 4.38.2 | T5 model for question generation |
| `torch` | 2.2.1 | PyTorch backend for transformers |
| `nltk` | 3.8.1 | WordNet access for distractor generation |
| `scikit-learn` | 1.4.1.post1 | TF-IDF vectorizer |
| `sentencepiece` | latest | T5's subword tokenizer |
| `streamlit` | 1.33.0 | Web UI framework |
| `gensim` | 4.3.2 | Word2Vec / GloVe loading (optional) |
| `numpy` | 1.26.4 | TF-IDF matrix operations |

**Pre-trained model used:**
- `valhalla/t5-small-qg-hl` — T5-small fine-tuned on SQuAD 1.0 for answer-aware question generation using highlight format. Hosted on HuggingFace Hub, downloaded automatically on first run (~240MB).

---

## Setup & Installation

### Prerequisites
- Python 3.11+
- pip
- Internet connection (first run downloads the T5 model)

### Step 1 — Clone the repository
```bash
git clone https://github.com/tanmmayyy/mcq-generator.git
cd mcq-generator
```

### Step 2 — Create a virtual environment
```bash
python -m venv myenv

# Windows
myenv\Scripts\activate

# Mac/Linux
source myenv/bin/activate
```

### Step 3 — Install dependencies
```bash
pip install -r requirements.txt
pip install sentencepiece   # required for T5 tokenizer
```

### Step 4 — Download spaCy language model
```bash
# If the default command fails:
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl
```

### Step 5 — Verify installation
```bash
python -c "import spacy; nlp = spacy.load('en_core_web_sm'); print('spaCy OK')"
python -c "from transformers import pipeline; print('Transformers OK')"
```

---

## Running the App

```bash
streamlit run app/main.py
```

The app opens at `http://localhost:8501`. On first launch, the T5 model downloads (~240MB) and loads into memory — this takes 1–2 minutes. Subsequent launches are fast.

---

## Testing Each Module

Run these in order to verify each step of the pipeline works independently:

```bash
# Step 1 — Test preprocessing (NER, TF-IDF, sentence ranking)
python src/preprocessor.py

# Step 2 — Test question generation (T5 model)
python src/question_generator.py

# Step 3 — Test distractor generation (WordNet + NER)
python src/distractor_generator.py

# Step 4 — Test full pipeline end-to-end
python src/mcq_builder.py

# Step 5 — Test scoring
python src/evaluator.py
```

---

## Sample Output

**Input passage (ISRO):**
```
The Indian Space Research Organisation (ISRO) was founded in 1969 by Vikram Sarabhai.
ISRO developed India's first satellite, Aryabhata, which was launched in 1975.
The Chandrayaan-1 mission in 2008 discovered water molecules on the Moon.
In 2023, Chandrayaan-3 successfully landed near the lunar south pole.
The Mars Orbiter Mission, also called Mangalyaan, was launched in 2013.
```

**Generated questions:**

```
Q1: Who founded ISRO?
    A. Jawaharlal Nehru
    B. APJ Abdul Kalam
    C. Vikram Sarabhai  ✓
    D. Homi Bhabha

Q2: What was India's first satellite called?
    A. Chandrayaan
    B. Mangalyaan
    C. Rohini
    D. Aryabhata  ✓

Q3: When did the Chandrayaan-1 mission take place?
    A. 1975
    B. 2013
    C. 2023
    D. 2008  ✓

Q4: What mission made India the first Asian country to reach Mars orbit?
    A. Chandrayaan-3
    B. Aryabhata
    C. Mangalyaan  ✓
    D. Chandrayaan-1
```

---

## What Makes a Good Passage

The system performs best on **factual passages** that contain:

| Works well | Works poorly |
|---|---|
| People names (PERSON entities) | Opinion / descriptive text |
| Specific dates (DATE entities) | Passages with repeated entities |
| Organisation names (ORG entities) | Very short passages (< 5 sentences) |
| Place names (GPE entities) | Abstract/philosophical text |
| One clear fact per sentence | Sentences with multiple facts |

**Best passage types:** History, science, geography, biographies, Wikipedia-style articles

**Avoid:** Opinion pieces, marketing content, descriptive narratives without specific facts

---

## Known Limitations

1. **Passage type dependency** — Works best on factual text. Descriptive or opinion text produces poor questions because there are no named entities to use as answers.

2. **T5-small quality ceiling** — The model used (`t5-small`) has 60M parameters. Larger models like `t5-base` or `t5-large` would produce better questions but require more memory and time.

3. **Distractor diversity** — When a passage has few named entities, distractors may fall back to generic options. Fine-tuning a separate T5 model on the RACE dataset for distractor generation would fix this.

4. **English only** — The current pipeline only supports English text. Extending to Hindi or other Indic languages would require multilingual spaCy models and a multilingual QG model.

5. **No semantic deduplication** — Two questions from the same passage can sometimes be semantically similar even if worded differently.

---

## Future Work

- [ ] Fine-tune a T5 distractor generation model on the RACE dataset (100k exam questions)
- [ ] Add support for Hindi using IndicNLP + multilingual BERT
- [ ] Add PDF upload support so users can quiz themselves on any document
- [ ] BLEU/METEOR/ROUGE automated evaluation of generated questions
- [ ] Difficulty scoring per question based on distractor plausibility
- [ ] Export quiz as PDF for offline use

---

## Related Research

Papers that use similar approaches — cited for comparison:

1. **Automatic Generation of Multiple-Choice Questions (2023)**
   Zhang et al. — T5 with pre/postprocessing pipelines for MCQ generation
   https://arxiv.org/abs/2303.14576

2. **Deep Learning and Linguistic Feature Based Automatic MCQ Generation (Springer, ICDCIT 2022)**
   Agarwal et al. — DL + linguistic features for MCQ generation (same 3-step pipeline)
   https://link.springer.com/chapter/10.1007/978-3-030-94876-4_18

3. **End-to-End MCQ Generation Using T5 (ScienceDirect 2022)**
   Rodriguez-Torrealba et al. — Full T5-based pipeline with Wikipedia passages
   https://www.sciencedirect.com/science/article/pii/S0957417422014014

4. **Leaf — MCQ Generation System (ECIR 2022)**
   Vachev et al. — Two fine-tuned T5 models: one for QG, one for DG on RACE
   https://github.com/KristiyanVachev/Leaf-Question-Generation

5. **Automatic Distractor Generation — Systematic Review (PMC 2024)**
   Comprehensive review of distractor generation methods including WordNet and T5
   https://pmc.ncbi.nlm.nih.gov/articles/PMC11623049/

6. **Automatic Question Generation: A Review (Springer/PMC 2023)**
   Mulla & Gharpure — Survey of methodologies, datasets, and evaluation metrics
   https://pmc.ncbi.nlm.nih.gov/articles/PMC9886210/

**What differentiates this project from the above:**
- End-to-end pipeline with interactive quiz UI (most papers only generate questions)
- NER-type-matching distractor strategy (distractors always same entity type as answer)
- Multi-layer quality filtering at both question and MCQ level
- Answer circularity detection (rejects questions where answer appears in the question)

---

## Course Outcomes Covered

| CO | Description | How this project covers it |
|---|---|---|
| CO1 | Articulate NLP and word representation | TF-IDF, NER, WordNet, word embeddings all implemented and explained |
| CO2 | Build deep learning models for NLP problems | T5 transformer for QG (seq2seq), beam search decoding, transfer learning |
| CO3 | Implement ML/DL solutions in real context | End-to-end deployable system with Streamlit UI and interactive demo |

---

## Author

**[Tanmay Jain]**
[ Bennett University]


---

*Built with spaCy, HuggingFace Transformers, NLTK, scikit-learn, and Streamlit.*