Spaces:

tanmmayyy
/

mcq_generator

Running

App Files Files Community

mcq_generator / README.md

tanmmayyy

fix huggingface yaml

3eee1f2 11 days ago

preview code

raw

history blame contribute delete

20.9 kB

A newer version of the Streamlit SDK is available: 1.57.0

Upgrade

metadata

title: MCQ Generator
emoji: 📝
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.33.0
app_file: app/main.py
pinned: false

📝 MCQ Generator — Automatic Multiple Choice Question Generator

An end-to-end NLP pipeline that reads any text passage and automatically generates a complete multiple-choice quiz with scoring and explanations.

Built as a course project for an NLP curriculum covering Modules I–IV: tokenization, word embeddings, transformers, and natural language generation.

📌 Table of Contents

What This Project Does
Live Demo
How It Works — The Full Pipeline
NLP Techniques Used
Project Structure
Each File Explained
Tech Stack
Setup & Installation
Running the App
Testing Each Module
Sample Output
What Makes a Good Passage
Known Limitations
Future Work
Related Research
Course Outcomes Covered

What This Project Does

Given any factual text passage, this system:

Extracts the most important sentences using TF-IDF ranking
Identifies answer candidates using Named Entity Recognition (NER)
Generates natural language questions using a T5 transformer model
Creates plausible wrong options (distractors) using WordNet and NER
Presents an interactive quiz with scoring and per-question explanations

Example:

Input passage:

Albert Einstein was born on March 14, 1879, in Ulm, Germany.
He was awarded the Nobel Prize in Physics in 1921 for his
discovery of the photoelectric effect.

Generated MCQ:

Q: Where was Albert Einstein born?

A. France
B. Germany  ✓
C. United States
D. Switzerland

Live Demo

streamlit run app/main.py

Opens at http://localhost:8501 in your browser.

How It Works — The Full Pipeline

Raw Text Passage
       │
       ▼
┌─────────────────────────────────────────────┐
│  STEP 1: PREPROCESSING  (preprocessor.py)   │
│                                             │
│  • Split into sentences (spaCy)             │
│  • Rank by TF-IDF score (scikit-learn)      │
│  • Extract Named Entities (spaCy NER)       │
│  • Filter answer candidates (blacklist)     │
└─────────────────┬───────────────────────────┘
                  │  top sentences + answer candidates
                  ▼
┌─────────────────────────────────────────────┐
│  STEP 2: QUESTION GENERATION                │
│          (question_generator.py)            │
│                                             │
│  • Highlight answer in sentence with <hl>   │
│  • Feed to T5 transformer model             │
│  • Generate 3 candidate questions           │
│  • Validate: reject circular/vague Qs       │
└─────────────────┬───────────────────────────┘
                  │  (question, answer) pairs
                  ▼
┌─────────────────────────────────────────────┐
│  STEP 3: DISTRACTOR GENERATION              │
│          (distractor_generator.py)          │
│                                             │
│  Strategy 1: Same-type NER entities         │
│              from the passage               │
│  Strategy 2: WordNet hyponym siblings       │
│  Strategy 3: Cross-label fallback           │
└─────────────────┬───────────────────────────┘
                  │  3 wrong options per question
                  ▼
┌─────────────────────────────────────────────┐
│  STEP 4: MCQ ASSEMBLY + VALIDATION          │
│          (mcq_builder.py)                   │
│                                             │
│  • Combine answer + distractors             │
│  • Shuffle options randomly                 │
│  • Quality gate: dedup, similarity check    │
│  • Return list of MCQ objects               │
└─────────────────┬───────────────────────────┘
                  │  validated MCQ list
                  ▼
┌─────────────────────────────────────────────┐
│  STEP 5: QUIZ UI + SCORING                  │
│          (app/main.py + evaluator.py)       │
│                                             │
│  • Streamlit 3-screen app                   │
│  • Input → Quiz → Results                   │
│  • Score, feedback, explanations            │
└─────────────────────────────────────────────┘

NLP Techniques Used

Module I — Foundational NLP

Technique	Where Used	Purpose
Tokenization	`preprocessor.py`	Split text into sentences and tokens using spaCy
Lemmatization	`preprocessor.py`	Normalize word forms for TF-IDF
Stop word removal	`preprocessor.py`	Filter noise before TF-IDF scoring
Named Entity Recognition (NER)	`preprocessor.py`	Find PERSON, ORG, DATE, GPE as answer candidates
POS Tagging	`preprocessor.py`	Identify nouns and proper nouns
WordNet	`distractor_generator.py`	Find semantically related words as distractors
Synsets / Hyponyms	`distractor_generator.py`	Navigate WordNet hierarchy for same-category words

Module II — Word Representation

Technique	Where Used	Purpose
TF-IDF	`preprocessor.py`	Rank sentences by information density
Word Embeddings (GloVe)	`distractor_generator.py`	Optional cosine-similarity based distractor finding

TF-IDF explained:

TF (Term Frequency) = how often a word appears in this sentence
IDF (Inverse Document Frequency) = how rare the word is across all sentences
High TF-IDF score = sentence contains rare, informative words → good question source

Module III — Deep Learning for NLP

Technique	Where Used	Purpose
Transformers	`question_generator.py`	T5 model for question generation
Transfer Learning	`question_generator.py`	Using pre-trained T5 fine-tuned on SQuAD
Seq2Seq	`question_generator.py`	Encoder-decoder architecture of T5
Beam Search	`question_generator.py`	Generate multiple question candidates, pick best

Module IV — Advanced NLP

Technique	Where Used	Purpose
T5 (Text-to-Text Transfer Transformer)	`question_generator.py`	State-of-the-art QG model
Natural Language Generation (NLG)	`question_generator.py`	Generating grammatical questions
Subword Tokenization (SentencePiece)	`question_generator.py`	T5's tokenizer handles rare/unknown words
Pre-trained Models	`question_generator.py`	`valhalla/t5-small-qg-hl` from HuggingFace

Project Structure

mcq_generator/
│
├── src/                          # Core NLP pipeline modules
│   ├── __init__.py
│   ├── preprocessor.py           # Text cleaning, TF-IDF, NER, answer extraction
│   ├── question_generator.py     # T5-based question generation
│   ├── distractor_generator.py   # WordNet + NER distractor generation
│   ├── mcq_builder.py            # Pipeline orchestrator + MCQ dataclass
│   └── evaluator.py              # Answer checking and scoring
│
├── app/                          # Streamlit web application
│   ├── __init__.py
│   ├── main.py                   # 3-screen app: input → quiz → results
│   └── components.py             # Reusable UI components
│
├── data/
│   └── sample_passages.json      # 5 test passages (ISRO, Gandhi, AI, etc.)
│
├── models/                       # (gitignored) Downloaded model files
│   └── README.md
│
├── notebooks/                    # Jupyter notebooks for exploration
│
├── config.py                     # All settings in one place
├── requirements.txt              # Python dependencies
└── README.md                     # This file

Each File Explained

`config.py`

Central settings file. Every other module imports from here.

Model name, number of questions, sentence count, file paths
Change values here to tune the entire system without touching logic files

`src/preprocessor.py`

The NLP foundation of the project.

Key functions:

extract_sentences(text) — spaCy sentence boundary detection
rank_sentences(sentences) — TF-IDF scoring, returns top N most informative sentences
extract_answer_candidates(sentence) — NER-based extraction with strict quality filters
preprocess(text) — full pipeline, returns structured dict

Design decisions:

Only PERSON, ORG, GPE, DATE, EVENT, WORK_OF_ART NER labels are accepted as answers
A BLACKLIST of 30+ generic words ("annual", "various", "Moon") prevents trivial answers
Answers are sorted by priority: PERSON > ORG/GPE > DATE > others

`src/question_generator.py`

Uses the valhalla/t5-small-qg-hl model — a T5-small fine-tuned on SQuAD for question generation.

How T5 QG works:

Input:  "generate question: ISRO was founded in <hl> 1969 <hl> by Vikram Sarabhai."
Output: "In what year was ISRO founded?"

Key functions:

highlight_answer(sentence, answer) — wraps answer in <hl> tags
generate_question(sentence, answer) — beam search with 5 beams, 3 candidates
answer_is_addressable(question, answer) — rejects circular, vague, or short questions

Quality filters applied:

Must start with a question word (what/who/when/where/which/how)
Answer must NOT appear in the question
Abbreviation trap detection (e.g. rejects Q: "What does ISRO stand for?" when A is the full name)
Minimum 5 words

`src/distractor_generator.py`

Generates 3 plausible wrong answer options. Uses a priority-based strategy chain.

Strategy 1 — Same-label NER (best): Finds other entities of the same NER type from the passage.

Answer: "1969" (DATE) → Distractors: ["1975", "2008", "2023"]  (other DATEs in passage)
Answer: "Vikram Sarabhai" (PERSON) → Distractors: ["Kalam", "Dhawan", "Nehru"]

Strategy 2 — WordNet hyponyms: Navigates the WordNet hierarchy to find sibling words in the same semantic category.

Answer: "India" → hypernym: "country" → hyponyms: ["China", "Brazil", "Pakistan"]

Strategy 3 — Cross-label fallback: Uses any other named entity from the passage if strategies 1 and 2 fail.

`src/mcq_builder.py`

The single entry point that the UI calls. Orchestrates the entire pipeline.

MCQ dataclass:

@dataclass
class MCQ:
    question       : str
    options        : list      # 4 shuffled options
    correct_index  : int       # index of correct answer (0-3)
    correct_answer : str
    explanation    : str       # original sentence

Quality gate is_valid_mcq():

No two options can be too similar (catches "WWE" vs "World Wrestling Entertainment")
Answer must appear exactly once in options
Maximum 1 generic placeholder option allowed
Answer must not appear in question text

`src/evaluator.py`

Checks answers and computes scores.

Returns:

{
  "score"     : 7,
  "total"     : 10,
  "percentage": 70.0,
  "feedback"  : "Good effort! Review the explanations...",
  "results"   : [ {per-question breakdown} ]
}

`app/main.py`

Streamlit app with 3 screens managed via st.session_state:

Screen 1 (input): Text area + question count slider + Generate button
Screen 2 (quiz): One question at a time, radio buttons, Previous/Next/Submit
Screen 3 (results): Score banner + per-question feedback with explanations

`app/components.py`

Reusable display functions:

render_question_card() — A/B/C/D labelled radio buttons
render_result_card() — green (correct) / red (wrong) with explanation
render_score_summary() — score banner + metric cards

Tech Stack

Library	Version	Purpose
`spaCy`	3.7.4	Tokenization, NER, POS tagging, sentence splitting
`transformers`	4.38.2	T5 model for question generation
`torch`	2.2.1	PyTorch backend for transformers
`nltk`	3.8.1	WordNet access for distractor generation
`scikit-learn`	1.4.1.post1	TF-IDF vectorizer
`sentencepiece`	latest	T5's subword tokenizer
`streamlit`	1.33.0	Web UI framework
`gensim`	4.3.2	Word2Vec / GloVe loading (optional)
`numpy`	1.26.4	TF-IDF matrix operations

Pre-trained model used:

valhalla/t5-small-qg-hl — T5-small fine-tuned on SQuAD 1.0 for answer-aware question generation using highlight format. Hosted on HuggingFace Hub, downloaded automatically on first run (~240MB).

Setup & Installation

Prerequisites

Python 3.11+
pip
Internet connection (first run downloads the T5 model)

Step 1 — Clone the repository

git clone https://github.com/tanmmayyy/mcq-generator.git
cd mcq-generator

Step 2 — Create a virtual environment

python -m venv myenv

# Windows
myenv\Scripts\activate

# Mac/Linux
source myenv/bin/activate

Step 3 — Install dependencies

pip install -r requirements.txt
pip install sentencepiece   # required for T5 tokenizer

Step 4 — Download spaCy language model

# If the default command fails:
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl

Step 5 — Verify installation

python -c "import spacy; nlp = spacy.load('en_core_web_sm'); print('spaCy OK')"
python -c "from transformers import pipeline; print('Transformers OK')"

Running the App

streamlit run app/main.py

The app opens at http://localhost:8501. On first launch, the T5 model downloads (~240MB) and loads into memory — this takes 1–2 minutes. Subsequent launches are fast.

Testing Each Module

Run these in order to verify each step of the pipeline works independently:

# Step 1 — Test preprocessing (NER, TF-IDF, sentence ranking)
python src/preprocessor.py

# Step 2 — Test question generation (T5 model)
python src/question_generator.py

# Step 3 — Test distractor generation (WordNet + NER)
python src/distractor_generator.py

# Step 4 — Test full pipeline end-to-end
python src/mcq_builder.py

# Step 5 — Test scoring
python src/evaluator.py

Sample Output

Input passage (ISRO):

The Indian Space Research Organisation (ISRO) was founded in 1969 by Vikram Sarabhai.
ISRO developed India's first satellite, Aryabhata, which was launched in 1975.
The Chandrayaan-1 mission in 2008 discovered water molecules on the Moon.
In 2023, Chandrayaan-3 successfully landed near the lunar south pole.
The Mars Orbiter Mission, also called Mangalyaan, was launched in 2013.

Generated questions:

Q1: Who founded ISRO?
    A. Jawaharlal Nehru
    B. APJ Abdul Kalam
    C. Vikram Sarabhai  ✓
    D. Homi Bhabha

Q2: What was India's first satellite called?
    A. Chandrayaan
    B. Mangalyaan
    C. Rohini
    D. Aryabhata  ✓

Q3: When did the Chandrayaan-1 mission take place?
    A. 1975
    B. 2013
    C. 2023
    D. 2008  ✓

Q4: What mission made India the first Asian country to reach Mars orbit?
    A. Chandrayaan-3
    B. Aryabhata
    C. Mangalyaan  ✓
    D. Chandrayaan-1

What Makes a Good Passage

The system performs best on factual passages that contain:

Works well	Works poorly
People names (PERSON entities)	Opinion / descriptive text
Specific dates (DATE entities)	Passages with repeated entities
Organisation names (ORG entities)	Very short passages (< 5 sentences)
Place names (GPE entities)	Abstract/philosophical text
One clear fact per sentence	Sentences with multiple facts

Best passage types: History, science, geography, biographies, Wikipedia-style articles

Avoid: Opinion pieces, marketing content, descriptive narratives without specific facts

Known Limitations

Passage type dependency — Works best on factual text. Descriptive or opinion text produces poor questions because there are no named entities to use as answers.
T5-small quality ceiling — The model used (t5-small) has 60M parameters. Larger models like t5-base or t5-large would produce better questions but require more memory and time.
Distractor diversity — When a passage has few named entities, distractors may fall back to generic options. Fine-tuning a separate T5 model on the RACE dataset for distractor generation would fix this.
English only — The current pipeline only supports English text. Extending to Hindi or other Indic languages would require multilingual spaCy models and a multilingual QG model.
No semantic deduplication — Two questions from the same passage can sometimes be semantically similar even if worded differently.

Future Work

Fine-tune a T5 distractor generation model on the RACE dataset (100k exam questions)
Add support for Hindi using IndicNLP + multilingual BERT
Add PDF upload support so users can quiz themselves on any document
BLEU/METEOR/ROUGE automated evaluation of generated questions
Difficulty scoring per question based on distractor plausibility
Export quiz as PDF for offline use

Related Research

Papers that use similar approaches — cited for comparison:

Automatic Generation of Multiple-Choice Questions (2023) Zhang et al. — T5 with pre/postprocessing pipelines for MCQ generation https://arxiv.org/abs/2303.14576
Deep Learning and Linguistic Feature Based Automatic MCQ Generation (Springer, ICDCIT 2022) Agarwal et al. — DL + linguistic features for MCQ generation (same 3-step pipeline) https://link.springer.com/chapter/10.1007/978-3-030-94876-4_18
End-to-End MCQ Generation Using T5 (ScienceDirect 2022) Rodriguez-Torrealba et al. — Full T5-based pipeline with Wikipedia passages https://www.sciencedirect.com/science/article/pii/S0957417422014014
Leaf — MCQ Generation System (ECIR 2022) Vachev et al. — Two fine-tuned T5 models: one for QG, one for DG on RACE https://github.com/KristiyanVachev/Leaf-Question-Generation
Automatic Distractor Generation — Systematic Review (PMC 2024) Comprehensive review of distractor generation methods including WordNet and T5 https://pmc.ncbi.nlm.nih.gov/articles/PMC11623049/
Automatic Question Generation: A Review (Springer/PMC 2023) Mulla & Gharpure — Survey of methodologies, datasets, and evaluation metrics https://pmc.ncbi.nlm.nih.gov/articles/PMC9886210/

What differentiates this project from the above:

End-to-end pipeline with interactive quiz UI (most papers only generate questions)
NER-type-matching distractor strategy (distractors always same entity type as answer)
Multi-layer quality filtering at both question and MCQ level
Answer circularity detection (rejects questions where answer appears in the question)

Course Outcomes Covered

CO	Description	How this project covers it
CO1	Articulate NLP and word representation	TF-IDF, NER, WordNet, word embeddings all implemented and explained
CO2	Build deep learning models for NLP problems	T5 transformer for QG (seq2seq), beam search decoding, transfer learning
CO3	Implement ML/DL solutions in real context	End-to-end deployable system with Streamlit UI and interactive demo

Author

[Tanmay Jain] [ Bennett University]

Built with spaCy, HuggingFace Transformers, NLTK, scikit-learn, and Streamlit.