--- title: MCQ Generator emoji: 📝 colorFrom: blue colorTo: purple sdk: streamlit sdk_version: 1.33.0 app_file: app/main.py pinned: false --- # 📝 MCQ Generator — Automatic Multiple Choice Question Generator > **An end-to-end NLP pipeline that reads any text passage and automatically generates a complete multiple-choice quiz with scoring and explanations.** Built as a course project for an NLP curriculum covering Modules I–IV: tokenization, word embeddings, transformers, and natural language generation. --- ## 📌 Table of Contents 1. [What This Project Does](#what-this-project-does) 2. [Live Demo](#live-demo) 3. [How It Works — The Full Pipeline](#how-it-works--the-full-pipeline) 4. [NLP Techniques Used](#nlp-techniques-used) 5. [Project Structure](#project-structure) 6. [Each File Explained](#each-file-explained) 7. [Tech Stack](#tech-stack) 8. [Setup & Installation](#setup--installation) 9. [Running the App](#running-the-app) 10. [Testing Each Module](#testing-each-module) 11. [Sample Output](#sample-output) 12. [What Makes a Good Passage](#what-makes-a-good-passage) 13. [Known Limitations](#known-limitations) 14. [Future Work](#future-work) 15. [Related Research](#related-research) 16. [Course Outcomes Covered](#course-outcomes-covered) --- ## What This Project Does Given any factual text passage, this system: 1. **Extracts** the most important sentences using TF-IDF ranking 2. **Identifies** answer candidates using Named Entity Recognition (NER) 3. **Generates** natural language questions using a T5 transformer model 4. **Creates** plausible wrong options (distractors) using WordNet and NER 5. **Presents** an interactive quiz with scoring and per-question explanations **Example:** Input passage: ``` Albert Einstein was born on March 14, 1879, in Ulm, Germany. He was awarded the Nobel Prize in Physics in 1921 for his discovery of the photoelectric effect. ``` Generated MCQ: ``` Q: Where was Albert Einstein born? A. France B. Germany ✓ C. United States D. Switzerland ``` --- ## Live Demo ```bash streamlit run app/main.py ``` Opens at `http://localhost:8501` in your browser. --- ## How It Works — The Full Pipeline ``` Raw Text Passage │ ▼ ┌─────────────────────────────────────────────┐ │ STEP 1: PREPROCESSING (preprocessor.py) │ │ │ │ • Split into sentences (spaCy) │ │ • Rank by TF-IDF score (scikit-learn) │ │ • Extract Named Entities (spaCy NER) │ │ • Filter answer candidates (blacklist) │ └─────────────────┬───────────────────────────┘ │ top sentences + answer candidates ▼ ┌─────────────────────────────────────────────┐ │ STEP 2: QUESTION GENERATION │ │ (question_generator.py) │ │ │ │ • Highlight answer in sentence with │ │ • Feed to T5 transformer model │ │ • Generate 3 candidate questions │ │ • Validate: reject circular/vague Qs │ └─────────────────┬───────────────────────────┘ │ (question, answer) pairs ▼ ┌─────────────────────────────────────────────┐ │ STEP 3: DISTRACTOR GENERATION │ │ (distractor_generator.py) │ │ │ │ Strategy 1: Same-type NER entities │ │ from the passage │ │ Strategy 2: WordNet hyponym siblings │ │ Strategy 3: Cross-label fallback │ └─────────────────┬───────────────────────────┘ │ 3 wrong options per question ▼ ┌─────────────────────────────────────────────┐ │ STEP 4: MCQ ASSEMBLY + VALIDATION │ │ (mcq_builder.py) │ │ │ │ • Combine answer + distractors │ │ • Shuffle options randomly │ │ • Quality gate: dedup, similarity check │ │ • Return list of MCQ objects │ └─────────────────┬───────────────────────────┘ │ validated MCQ list ▼ ┌─────────────────────────────────────────────┐ │ STEP 5: QUIZ UI + SCORING │ │ (app/main.py + evaluator.py) │ │ │ │ • Streamlit 3-screen app │ │ • Input → Quiz → Results │ │ • Score, feedback, explanations │ └─────────────────────────────────────────────┘ ``` --- ## NLP Techniques Used ### Module I — Foundational NLP | Technique | Where Used | Purpose | |---|---|---| | Tokenization | `preprocessor.py` | Split text into sentences and tokens using spaCy | | Lemmatization | `preprocessor.py` | Normalize word forms for TF-IDF | | Stop word removal | `preprocessor.py` | Filter noise before TF-IDF scoring | | Named Entity Recognition (NER) | `preprocessor.py` | Find PERSON, ORG, DATE, GPE as answer candidates | | POS Tagging | `preprocessor.py` | Identify nouns and proper nouns | | WordNet | `distractor_generator.py` | Find semantically related words as distractors | | Synsets / Hyponyms | `distractor_generator.py` | Navigate WordNet hierarchy for same-category words | ### Module II — Word Representation | Technique | Where Used | Purpose | |---|---|---| | TF-IDF | `preprocessor.py` | Rank sentences by information density | | Word Embeddings (GloVe) | `distractor_generator.py` | Optional cosine-similarity based distractor finding | **TF-IDF explained:** - **TF (Term Frequency)** = how often a word appears in *this* sentence - **IDF (Inverse Document Frequency)** = how rare the word is across *all* sentences - High TF-IDF score = sentence contains rare, informative words → good question source ### Module III — Deep Learning for NLP | Technique | Where Used | Purpose | |---|---|---| | Transformers | `question_generator.py` | T5 model for question generation | | Transfer Learning | `question_generator.py` | Using pre-trained T5 fine-tuned on SQuAD | | Seq2Seq | `question_generator.py` | Encoder-decoder architecture of T5 | | Beam Search | `question_generator.py` | Generate multiple question candidates, pick best | ### Module IV — Advanced NLP | Technique | Where Used | Purpose | |---|---|---| | T5 (Text-to-Text Transfer Transformer) | `question_generator.py` | State-of-the-art QG model | | Natural Language Generation (NLG) | `question_generator.py` | Generating grammatical questions | | Subword Tokenization (SentencePiece) | `question_generator.py` | T5's tokenizer handles rare/unknown words | | Pre-trained Models | `question_generator.py` | `valhalla/t5-small-qg-hl` from HuggingFace | --- ## Project Structure ``` mcq_generator/ │ ├── src/ # Core NLP pipeline modules │ ├── __init__.py │ ├── preprocessor.py # Text cleaning, TF-IDF, NER, answer extraction │ ├── question_generator.py # T5-based question generation │ ├── distractor_generator.py # WordNet + NER distractor generation │ ├── mcq_builder.py # Pipeline orchestrator + MCQ dataclass │ └── evaluator.py # Answer checking and scoring │ ├── app/ # Streamlit web application │ ├── __init__.py │ ├── main.py # 3-screen app: input → quiz → results │ └── components.py # Reusable UI components │ ├── data/ │ └── sample_passages.json # 5 test passages (ISRO, Gandhi, AI, etc.) │ ├── models/ # (gitignored) Downloaded model files │ └── README.md │ ├── notebooks/ # Jupyter notebooks for exploration │ ├── config.py # All settings in one place ├── requirements.txt # Python dependencies └── README.md # This file ``` --- ## Each File Explained ### `config.py` Central settings file. Every other module imports from here. - Model name, number of questions, sentence count, file paths - Change values here to tune the entire system without touching logic files ### `src/preprocessor.py` The NLP foundation of the project. **Key functions:** - `extract_sentences(text)` — spaCy sentence boundary detection - `rank_sentences(sentences)` — TF-IDF scoring, returns top N most informative sentences - `extract_answer_candidates(sentence)` — NER-based extraction with strict quality filters - `preprocess(text)` — full pipeline, returns structured dict **Design decisions:** - Only `PERSON`, `ORG`, `GPE`, `DATE`, `EVENT`, `WORK_OF_ART` NER labels are accepted as answers - A `BLACKLIST` of 30+ generic words ("annual", "various", "Moon") prevents trivial answers - Answers are sorted by priority: PERSON > ORG/GPE > DATE > others ### `src/question_generator.py` Uses the `valhalla/t5-small-qg-hl` model — a T5-small fine-tuned on SQuAD for question generation. **How T5 QG works:** ``` Input: "generate question: ISRO was founded in 1969 by Vikram Sarabhai." Output: "In what year was ISRO founded?" ``` **Key functions:** - `highlight_answer(sentence, answer)` — wraps answer in `` tags - `generate_question(sentence, answer)` — beam search with 5 beams, 3 candidates - `answer_is_addressable(question, answer)` — rejects circular, vague, or short questions **Quality filters applied:** - Must start with a question word (what/who/when/where/which/how) - Answer must NOT appear in the question - Abbreviation trap detection (e.g. rejects Q: "What does ISRO stand for?" when A is the full name) - Minimum 5 words ### `src/distractor_generator.py` Generates 3 plausible wrong answer options. Uses a priority-based strategy chain. **Strategy 1 — Same-label NER (best):** Finds other entities of the same NER type from the passage. ``` Answer: "1969" (DATE) → Distractors: ["1975", "2008", "2023"] (other DATEs in passage) Answer: "Vikram Sarabhai" (PERSON) → Distractors: ["Kalam", "Dhawan", "Nehru"] ``` **Strategy 2 — WordNet hyponyms:** Navigates the WordNet hierarchy to find sibling words in the same semantic category. ``` Answer: "India" → hypernym: "country" → hyponyms: ["China", "Brazil", "Pakistan"] ``` **Strategy 3 — Cross-label fallback:** Uses any other named entity from the passage if strategies 1 and 2 fail. ### `src/mcq_builder.py` The single entry point that the UI calls. Orchestrates the entire pipeline. **MCQ dataclass:** ```python @dataclass class MCQ: question : str options : list # 4 shuffled options correct_index : int # index of correct answer (0-3) correct_answer : str explanation : str # original sentence ``` **Quality gate `is_valid_mcq()`:** - No two options can be too similar (catches "WWE" vs "World Wrestling Entertainment") - Answer must appear exactly once in options - Maximum 1 generic placeholder option allowed - Answer must not appear in question text ### `src/evaluator.py` Checks answers and computes scores. **Returns:** ```python { "score" : 7, "total" : 10, "percentage": 70.0, "feedback" : "Good effort! Review the explanations...", "results" : [ {per-question breakdown} ] } ``` ### `app/main.py` Streamlit app with 3 screens managed via `st.session_state`: - **Screen 1 (input):** Text area + question count slider + Generate button - **Screen 2 (quiz):** One question at a time, radio buttons, Previous/Next/Submit - **Screen 3 (results):** Score banner + per-question feedback with explanations ### `app/components.py` Reusable display functions: - `render_question_card()` — A/B/C/D labelled radio buttons - `render_result_card()` — green (correct) / red (wrong) with explanation - `render_score_summary()` — score banner + metric cards --- ## Tech Stack | Library | Version | Purpose | |---|---|---| | `spaCy` | 3.7.4 | Tokenization, NER, POS tagging, sentence splitting | | `transformers` | 4.38.2 | T5 model for question generation | | `torch` | 2.2.1 | PyTorch backend for transformers | | `nltk` | 3.8.1 | WordNet access for distractor generation | | `scikit-learn` | 1.4.1.post1 | TF-IDF vectorizer | | `sentencepiece` | latest | T5's subword tokenizer | | `streamlit` | 1.33.0 | Web UI framework | | `gensim` | 4.3.2 | Word2Vec / GloVe loading (optional) | | `numpy` | 1.26.4 | TF-IDF matrix operations | **Pre-trained model used:** - `valhalla/t5-small-qg-hl` — T5-small fine-tuned on SQuAD 1.0 for answer-aware question generation using highlight format. Hosted on HuggingFace Hub, downloaded automatically on first run (~240MB). --- ## Setup & Installation ### Prerequisites - Python 3.11+ - pip - Internet connection (first run downloads the T5 model) ### Step 1 — Clone the repository ```bash git clone https://github.com/tanmmayyy/mcq-generator.git cd mcq-generator ``` ### Step 2 — Create a virtual environment ```bash python -m venv myenv # Windows myenv\Scripts\activate # Mac/Linux source myenv/bin/activate ``` ### Step 3 — Install dependencies ```bash pip install -r requirements.txt pip install sentencepiece # required for T5 tokenizer ``` ### Step 4 — Download spaCy language model ```bash # If the default command fails: pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl ``` ### Step 5 — Verify installation ```bash python -c "import spacy; nlp = spacy.load('en_core_web_sm'); print('spaCy OK')" python -c "from transformers import pipeline; print('Transformers OK')" ``` --- ## Running the App ```bash streamlit run app/main.py ``` The app opens at `http://localhost:8501`. On first launch, the T5 model downloads (~240MB) and loads into memory — this takes 1–2 minutes. Subsequent launches are fast. --- ## Testing Each Module Run these in order to verify each step of the pipeline works independently: ```bash # Step 1 — Test preprocessing (NER, TF-IDF, sentence ranking) python src/preprocessor.py # Step 2 — Test question generation (T5 model) python src/question_generator.py # Step 3 — Test distractor generation (WordNet + NER) python src/distractor_generator.py # Step 4 — Test full pipeline end-to-end python src/mcq_builder.py # Step 5 — Test scoring python src/evaluator.py ``` --- ## Sample Output **Input passage (ISRO):** ``` The Indian Space Research Organisation (ISRO) was founded in 1969 by Vikram Sarabhai. ISRO developed India's first satellite, Aryabhata, which was launched in 1975. The Chandrayaan-1 mission in 2008 discovered water molecules on the Moon. In 2023, Chandrayaan-3 successfully landed near the lunar south pole. The Mars Orbiter Mission, also called Mangalyaan, was launched in 2013. ``` **Generated questions:** ``` Q1: Who founded ISRO? A. Jawaharlal Nehru B. APJ Abdul Kalam C. Vikram Sarabhai ✓ D. Homi Bhabha Q2: What was India's first satellite called? A. Chandrayaan B. Mangalyaan C. Rohini D. Aryabhata ✓ Q3: When did the Chandrayaan-1 mission take place? A. 1975 B. 2013 C. 2023 D. 2008 ✓ Q4: What mission made India the first Asian country to reach Mars orbit? A. Chandrayaan-3 B. Aryabhata C. Mangalyaan ✓ D. Chandrayaan-1 ``` --- ## What Makes a Good Passage The system performs best on **factual passages** that contain: | Works well | Works poorly | |---|---| | People names (PERSON entities) | Opinion / descriptive text | | Specific dates (DATE entities) | Passages with repeated entities | | Organisation names (ORG entities) | Very short passages (< 5 sentences) | | Place names (GPE entities) | Abstract/philosophical text | | One clear fact per sentence | Sentences with multiple facts | **Best passage types:** History, science, geography, biographies, Wikipedia-style articles **Avoid:** Opinion pieces, marketing content, descriptive narratives without specific facts --- ## Known Limitations 1. **Passage type dependency** — Works best on factual text. Descriptive or opinion text produces poor questions because there are no named entities to use as answers. 2. **T5-small quality ceiling** — The model used (`t5-small`) has 60M parameters. Larger models like `t5-base` or `t5-large` would produce better questions but require more memory and time. 3. **Distractor diversity** — When a passage has few named entities, distractors may fall back to generic options. Fine-tuning a separate T5 model on the RACE dataset for distractor generation would fix this. 4. **English only** — The current pipeline only supports English text. Extending to Hindi or other Indic languages would require multilingual spaCy models and a multilingual QG model. 5. **No semantic deduplication** — Two questions from the same passage can sometimes be semantically similar even if worded differently. --- ## Future Work - [ ] Fine-tune a T5 distractor generation model on the RACE dataset (100k exam questions) - [ ] Add support for Hindi using IndicNLP + multilingual BERT - [ ] Add PDF upload support so users can quiz themselves on any document - [ ] BLEU/METEOR/ROUGE automated evaluation of generated questions - [ ] Difficulty scoring per question based on distractor plausibility - [ ] Export quiz as PDF for offline use --- ## Related Research Papers that use similar approaches — cited for comparison: 1. **Automatic Generation of Multiple-Choice Questions (2023)** Zhang et al. — T5 with pre/postprocessing pipelines for MCQ generation https://arxiv.org/abs/2303.14576 2. **Deep Learning and Linguistic Feature Based Automatic MCQ Generation (Springer, ICDCIT 2022)** Agarwal et al. — DL + linguistic features for MCQ generation (same 3-step pipeline) https://link.springer.com/chapter/10.1007/978-3-030-94876-4_18 3. **End-to-End MCQ Generation Using T5 (ScienceDirect 2022)** Rodriguez-Torrealba et al. — Full T5-based pipeline with Wikipedia passages https://www.sciencedirect.com/science/article/pii/S0957417422014014 4. **Leaf — MCQ Generation System (ECIR 2022)** Vachev et al. — Two fine-tuned T5 models: one for QG, one for DG on RACE https://github.com/KristiyanVachev/Leaf-Question-Generation 5. **Automatic Distractor Generation — Systematic Review (PMC 2024)** Comprehensive review of distractor generation methods including WordNet and T5 https://pmc.ncbi.nlm.nih.gov/articles/PMC11623049/ 6. **Automatic Question Generation: A Review (Springer/PMC 2023)** Mulla & Gharpure — Survey of methodologies, datasets, and evaluation metrics https://pmc.ncbi.nlm.nih.gov/articles/PMC9886210/ **What differentiates this project from the above:** - End-to-end pipeline with interactive quiz UI (most papers only generate questions) - NER-type-matching distractor strategy (distractors always same entity type as answer) - Multi-layer quality filtering at both question and MCQ level - Answer circularity detection (rejects questions where answer appears in the question) --- ## Course Outcomes Covered | CO | Description | How this project covers it | |---|---|---| | CO1 | Articulate NLP and word representation | TF-IDF, NER, WordNet, word embeddings all implemented and explained | | CO2 | Build deep learning models for NLP problems | T5 transformer for QG (seq2seq), beam search decoding, transfer learning | | CO3 | Implement ML/DL solutions in real context | End-to-end deployable system with Streamlit UI and interactive demo | --- ## Author **[Tanmay Jain]** [ Bennett University] --- *Built with spaCy, HuggingFace Transformers, NLTK, scikit-learn, and Streamlit.*