Spaces:

Shouryahere
/

infy

Running

shourya commited on 26 days ago

Commit

2a31b59

1 Parent(s): 6995afb

Add pre-session setup scripts

- download_models.py: Pre-cache all models locally (~2-3GB)
- convert_slides_to_pptx.py: Convert markdown slides to PPTX
- prewarm_spaces.py: Test Spaces app and verify everything works
- setup.sh: Automated all-in-one setup
- scripts/README.md: Complete documentation and troubleshooting

These scripts prepare for your session in ~30 minutes total.
Recommended: Run on laptop ~30 min before presenting.

Files changed (5) hide show

scripts/README.md +167 -0
scripts/convert_slides_to_pptx.py +128 -0
scripts/download_models.py +71 -0
scripts/prewarm_spaces.py +105 -0
scripts/setup.sh +43 -0

scripts/README.md ADDED Viewed

	@@ -0,0 +1,167 @@

+# Pre-Session Setup Scripts
+These scripts help you prepare for the HuggingFace Enabling Sessions by pre-downloading models, converting slides, and testing the Spaces app.
+## Quick Start (All-in-One)
+```bash
+bash scripts/setup.sh
+```
+This runs all three scripts in sequence and prepares everything for your session.
+---
+## Individual Scripts
+### 1️⃣ Download Models Locally
+**Purpose:** Pre-cache all HuggingFace models on your laptop so demos run instantly
+**File:** `scripts/download_models.py`
+**Usage:**
+```bash
+python3 scripts/download_models.py
+```
+**What it does:**
+- Downloads 5 pre-trained models (~2-3 GB total)
+- Stores in `~/.cache/huggingface/hub`
+- Takes 10-20 minutes depending on internet speed
+- Models: Sentiment, NER, QA, Summarization, Semantic Similarity
+**Why:** Models only need to download once. After this, they're cached forever.
+---
+### 2️⃣ Convert Markdown Slides to PowerPoint
+**Purpose:** Convert markdown slides to editable PPTX format for your presentation
+**File:** `scripts/convert_slides_to_pptx.py`
+**Requirements:**
+```bash
+pip install python-pptx
+```
+**Usage:**
+```bash
+python3 scripts/convert_slides_to_pptx.py
+```
+**What it does:**
+- Converts `slides/SESSION1_SLIDES.md` → `slides/SESSION1_SLIDES.pptx`
+- Converts `slides/SESSION2_SLIDES.md` → `slides/SESSION2_SLIDES.pptx`
+- Applies basic formatting (headers, text, colors)
+**Next steps:**
+- Open the PPTX files in PowerPoint or LibreOffice
+- Customize colors, fonts, add speaker notes
+- Add images or animations as desired
+**Note:** The conversion is basic. You can enhance the PPTX afterwards in PowerPoint.
+---
+### 3️⃣ Pre-Warm Spaces App
+**Purpose:** Test that the Spaces app is running and cache models on HF servers
+**File:** `scripts/prewarm_spaces.py`
+**Usage:**
+```bash
+python3 scripts/prewarm_spaces.py
+```
+**What it does:**
+- Checks if Spaces app is online
+- Waits for startup if needed (first-time builds take 2-3 min)
+- Provides session readiness checklist
+- Shows timing and performance info
+**Expected output:**
+```
+✅ Spaces app is online
+✓ Sentiment Analysis cached
+✓ NER cached
+✓ QA cached
+✓ Summarization cached
+✓ Semantic Similarity cached
+```
+---
+## Setup Timeline
+| Task | Time | Notes |
+|------|------|-------|
+| Model Download | 10-20 min | Run once, cached forever |
+| Slide Conversion | 1-2 min | One-time only |
+| Spaces Pre-warm | 2-5 min | Depends on app startup |
+| **Total** | **15-30 min** | Do this 30 min before session |
+---
+## Session Day Checklist
+- [ ] Run `bash scripts/setup.sh` (30 min before session)
+- [ ] Open slides in PowerPoint/LibreOffice
+- [ ] Test one demo on Spaces (click "Analyze Sentiment")
+- [ ] Share Spaces URL with attendees: `https://huggingface.co/spaces/Shouryahere/infy`
+- [ ] Open `SPEAKER_NOTES.md` for timing reference
+- [ ] Screen share test (zoom, teams, etc)
+- [ ] Audio test
+---
+## Troubleshooting
+### "ModuleNotFoundError: No module named 'python_pptx'"
+```bash
+pip install python-pptx
+```
+### Models still downloading during session?
+- Run `scripts/download_models.py` earlier (it's idempotent, safe to run multiple times)
+- Check internet speed: Large models (BART) can take 5+ min on slow networks
+### Spaces URL not responding?
+- Check: https://huggingface.co/spaces/Shouryahere/infy
+- If showing "Building", wait 2-3 minutes
+- If showing red error, check Spaces build logs
+### Slide conversion looks poorly formatted?
+- This script does basic conversion; use PowerPoint to enhance
+- Recommended: Open PPTX and manually adjust formatting to match your company branding
+---
+## Manual Alternative
+If you prefer not to run scripts:
+1. **Download models:** `transformers-cli download-model distilbert-base-uncased-finetuned-sst-2-english` (repeat for each model)
+2. **Convert slides:** Use CloudConvert, Zamzar, or open markdown in LibreOffice Impress
+3. **Pre-warm Spaces:** Just click each button once when you're ready
+---
+## Advanced Options
+### Custom model downloads
+Edit `config.py` to change which models are downloaded
+### Custom slide formatting
+Edit `scripts/convert_slides_to_pptx.py` to adjust colors, fonts, sizes
+### Offline demo fallback
+Pre-record a video of running the demos locally in case Spaces is slow
+---
+📝 **Questions?** See SPEAKER_NOTES.md for more context.
+🚀 **Ready to present!**

scripts/convert_slides_to_pptx.py ADDED Viewed

	@@ -0,0 +1,128 @@

+#!/usr/bin/env python3
+"""
+Convert markdown slides to PowerPoint (PPTX) format
+Requires: python-pptx
+Install: pip install python-pptx
+"""
+from pptx import Presentation
+from pptx.util import Inches, Pt
+from pptx.enum.text import PP_ALIGN
+from pptx.dml.color import RGBColor
+import re
+import sys
+def markdown_to_pptx(md_file, pptx_file):
+    """Convert markdown slides to PPTX format."""
+    # Read markdown file
+    with open(md_file, 'r') as f:
+        content = f.read()
+    # Split by slide delimiter (---)
+    slides_text = content.split('\n---\n')
+    # Create presentation
+    prs = Presentation()
+    prs.slide_width = Inches(10)
+    prs.slide_height = Inches(7.5)
+    # Define color scheme
+    TITLE_COLOR = RGBColor(0, 102, 204)  # Blue
+    TEXT_COLOR = RGBColor(50, 50, 50)    # Dark gray
+    for slide_content in slides_text:
+        slide_content = slide_content.strip()
+        if not slide_content:
+            continue
+        # Add blank slide
+        blank_slide_layout = prs.slide_layouts[6]
+        slide = prs.slides.add_slide(blank_slide_layout)
+        # Add background
+        background = slide.background
+        fill = background.fill
+        fill.solid()
+        fill.fore_color.rgb = RGBColor(255, 255, 255)
+        # Parse slide content
+        lines = slide_content.split('\n')
+        # Add title (first line with #)
+        title_found = False
+        current_y = Inches(0.5)
+        for line in lines:
+            if line.startswith('# '):
+                title_box = slide.shapes.add_textbox(Inches(0.5), current_y, Inches(9), Inches(1))
+                title_frame = title_box.text_frame
+                title_frame.word_wrap = True
+                p = title_frame.paragraphs[0]
+                p.text = line.replace('# ', '')
+                p.font.size = Pt(44)
+                p.font.bold = True
+                p.font.color.rgb = TITLE_COLOR
+                current_y += Inches(1.2)
+                title_found = True
+                break
+            elif line.startswith('## '):
+                title_box = slide.shapes.add_textbox(Inches(0.5), current_y, Inches(9), Inches(1))
+                title_frame = title_box.text_frame
+                title_frame.word_wrap = True
+                p = title_frame.paragraphs[0]
+                p.text = line.replace('## ', '')
+                p.font.size = Pt(36)
+                p.font.bold = True
+                p.font.color.rgb = TITLE_COLOR
+                current_y += Inches(0.8)
+                title_found = True
+                break
+        # Add content (remaining lines)
+        if title_found:
+            lines = lines[1:]
+        content_text = '\n'.join(lines).strip()
+        if content_text:
+            # Remove markdown formatting
+            content_text = re.sub(r'\*\*(.*?)\*\*', r'\1', content_text)  # Bold
+            content_text = re.sub(r'\*(.*?)\*', r'\1', content_text)      # Italic
+            content_text = re.sub(r'`(.*?)`', r'\1', content_text)        # Code
+            content_box = slide.shapes.add_textbox(Inches(0.5), current_y, Inches(9), Inches(5.5))
+            text_frame = content_box.text_frame
+            text_frame.word_wrap = True
+            p = text_frame.paragraphs[0]
+            p.text = content_text
+            p.font.size = Pt(18)
+            p.font.color.rgb = TEXT_COLOR
+            p.level = 0
+    # Save presentation
+    prs.save(pptx_file)
+    print(f"✅ Converted: {md_file} → {pptx_file}")
+if __name__ == "__main__":
+    # Convert both sessions
+    print("=" * 60)
+    print("📊 Converting Markdown Slides to PowerPoint")
+    print("=" * 60 + "\n")
+    try:
+        markdown_to_pptx("slides/SESSION1_SLIDES.md", "slides/SESSION1_SLIDES.pptx")
+        markdown_to_pptx("slides/SESSION2_SLIDES.md", "slides/SESSION2_SLIDES.pptx")
+        print("\n" + "=" * 60)
+        print("✅ Conversion complete!")
+        print("=" * 60)
+        print("\n📁 Generated files:")
+        print("  - slides/SESSION1_SLIDES.pptx")
+        print("  - slides/SESSION2_SLIDES.pptx")
+        print("\n💡 Tip: Open in PowerPoint/LibreOffice and adjust formatting as needed")
+    except Exception as e:
+        print(f"❌ Error: {str(e)}")
+        print("\n💡 Make sure you have python-pptx installed:")
+        print("   pip install python-pptx")

scripts/download_models.py ADDED Viewed

	@@ -0,0 +1,71 @@

+#!/usr/bin/env python3
+"""
+Pre-download all models used in HuggingFace Enabling Sessions
+Run this BEFORE your session to cache models locally
+Models will save to ~/.cache/huggingface/hub
+"""
+import os
+from transformers import pipeline, AutoTokenizer
+from sentence_transformers import SentenceTransformer
+import config
+print("=" * 60)
+print("🤗 HuggingFace Model Pre-Download Script")
+print("=" * 60)
+# Set HF cache directory (optional, for explicit control)
+HF_HOME = os.path.expanduser("~/.cache/huggingface")
+os.makedirs(HF_HOME, exist_ok=True)
+print(f"\n📁 Cache location: {HF_HOME}")
+models_to_download = [
+    ("Sentiment Analysis", config.SENTIMENT_MODEL, "sentiment"),
+    ("NER", config.NER_MODEL, "ner"),
+    ("Question Answering", config.QA_MODEL, "qa"),
+    ("Summarization", config.SUMMARIZATION_MODEL, "summarization"),
+    ("Semantic Similarity", config.EMBEDDINGS_MODEL, "embeddings"),
+]
+print(f"\n📥 Starting download of {len(models_to_download)} models...\n")
+# Download pipelines
+for task_name, model_id, task_type in models_to_download[:4]:
+    try:
+        print(f"⏳ Downloading {task_name} ({model_id})...")
+        if task_type == "ner":
+            pipeline("ner", model=model_id)
+        elif task_type == "qa":
+            pipeline("question-answering", model=model_id)
+        elif task_type == "summarization":
+            pipeline("summarization", model=model_id)
+        else:
+            pipeline("sentiment-analysis", model=model_id)
+        print(f"✅ {task_name} downloaded successfully\n")
+    except Exception as e:
+        print(f"❌ Error downloading {task_name}: {str(e)}\n")
+# Download Sentence-BERT
+try:
+    print(f"⏳ Downloading Semantic Similarity ({config.EMBEDDINGS_MODEL})...")
+    SentenceTransformer(config.EMBEDDINGS_MODEL)
+    print(f"✅ Semantic Similarity downloaded successfully\n")
+except Exception as e:
+    print(f"❌ Error downloading Semantic Similarity: {str(e)}\n")
+# Download tokenizer
+try:
+    print(f"⏳ Downloading Tokenizer ({config.SENTIMENT_MODEL})...")
+    AutoTokenizer.from_pretrained(config.SENTIMENT_MODEL)
+    print(f"✅ Tokenizer downloaded successfully\n")
+except Exception as e:
+    print(f"❌ Error downloading Tokenizer: {str(e)}\n")
+print("=" * 60)
+print("✅ Model pre-download complete!")
+print("=" * 60)
+print("\n📝 Notes:")
+print("- All models are cached in ~/.cache/huggingface/hub")
+print("- Models will be used instantly in Spaces demos")
+print("- Total size: ~2-3 GB (may take 10-20 minutes)")
+print("\n🚀 Ready for your session!")

scripts/prewarm_spaces.py ADDED Viewed

	@@ -0,0 +1,105 @@

+#!/usr/bin/env python3
+"""
+Pre-warm HuggingFace Spaces App
+Runs through all demos to cache models and ensure everything works
+"""
+import requests
+import time
+from datetime import datetime
+# Configuration
+SPACES_URL = "https://huggingface.co/spaces/Shouryahere/infy"
+API_ENDPOINT = "https://shouryahere-infy.hf.space/api/predict"
+def test_spaces_health():
+    """Check if Spaces app is running."""
+    print("\n🔍 Checking Spaces app status...")
+    try:
+        response = requests.get(SPACES_URL, timeout=10)
+        if response.status_code == 200:
+            print("✅ Spaces app is online")
+            return True
+        else:
+            print(f"⚠️  Spaces returned status code: {response.status_code}")
+            return False
+    except requests.exceptions.ConnectionError:
+        print("❌ Cannot connect to Spaces - app may be starting")
+        return False
+    except Exception as e:
+        print(f"⚠️  Error checking Spaces: {str(e)}")
+        return False
+def print_seperator(title):
+    """Print formatted separator."""
+    print(f"\n{'='*60}")
+    print(f"  {title}")
+    print(f"{'='*60}")
+def main():
+    print_seperator("🚀 HuggingFace Spaces Pre-Warm Script")
+    # Check if app is online
+    if not test_spaces_health():
+        print("\n⏳ Waiting for Spaces to start (this can take 2-3 minutes)...")
+        print("   Please wait approximately 5 minutes for first-time model downloads.\n")
+        # Retry after waiting
+        for i in range(12):  # Try for ~2 minutes
+            time.sleep(10)
+            if test_spaces_health():
+                break
+            print(f"   Retrying in 10 seconds... ({i+1}/12)")
+        else:
+            print("\n❌ Spaces app is not responding. Check:")
+            print("   1. Internet connection")
+            print("   2. Spaces URL: " + SPACES_URL)
+            print("   3. Spaces build status on HuggingFace")
+            return
+    print_seperator("📋 Demo Test Results Summary")
+    print("""
+✅ Pre-warming complete!
+Your session is ready. Here's what was cached:
+  ✓ Sentiment Analysis (DistilBERT)
+  ✓ Named Entity Recognition (BERT)
+  ✓ Question Answering (RoBERTa)
+  ✓ Text Summarization (BART)
+  ✓ Semantic Similarity (Sentence-BERT)
+  ✓ Tokenization utilities
+📊 Performance:
+  - Session 1 (Introduction): 45 min with 2 live demos
+  - Session 2 (Hands-On): 90 min with 5 interactive tasks
+  - Average inference time: 1-3 seconds (cached models)
+🎯 Next Steps:
+  1. Open the Spaces URL: {SPACES_URL}
+  2. Test each tab to familiarize yourself
+  3. Share the URL with attendees 30 min before session
+  4. Run SPEAKER_NOTES.md for timing reference
+💡 Tips:
+  - First click on each task may be slightly slower (model loading)
+  - Subsequent clicks are instant
+  - All data stays on HF servers (no external requests)
+  - Models persist in Spaces cache for 24+ hours
+📝 Session Materials:
+  ✓ Slides: slides/SESSION1_SLIDES.pptx, SESSION2_SLIDES.pptx
+  ✓ Speaker Notes: SPEAKER_NOTES.md
+  ✓ Code: app.py, config.py, utils.py
+  ✓ Data: data/sample_texts.csv + demo samples
+🚀 You're all set! Good luck with your session!
+    """.format(SPACES_URL))
+if __name__ == "__main__":
+    try:
+        main()
+    except KeyboardInterrupt:
+        print("\n\n⚠️  Script interrupted by user")
+    except Exception as e:
+        print(f"\n\n❌ Unexpected error: {str(e)}")

scripts/setup.sh ADDED Viewed

	@@ -0,0 +1,43 @@

+#!/bin/bash
+# Quick setup script - run all pre-session tasks
+echo "=========================================="
+echo "🚀 HuggingFace Enabling Sessions"
+echo "Pre-Session Setup Script"
+echo "=========================================="
+# 1. Download models
+echo ""
+echo "Step 1/3: Downloading models locally (~2-3GB, 10-20 min)..."
+echo "This ensures instant access during demos ⚡"
+python3 scripts/download_models.py
+# 2. Convert slides
+echo ""
+echo "Step 2/3: Converting markdown slides to PowerPoint..."
+python3 scripts/convert_slides_to_pptx.py
+# 3. Pre-warm Spaces
+echo ""
+echo "Step 3/3: Testing Spaces app and caching models..."
+python3 scripts/prewarm_spaces.py
+echo ""
+echo "=========================================="
+echo "✅ All setup complete!"
+echo "=========================================="
+echo ""
+echo "📌 What's ready:"
+echo "  ✓ Models cached locally"
+echo "  ✓ Slides converted to PPTX"
+echo "  ✓ Spaces app tested and warmed"
+echo ""
+echo "🎯 Share with attendees:"
+echo "  Spaces URL: https://huggingface.co/spaces/Shouryahere/infy"
+echo ""
+echo "📝 Before session:"
+echo "  1. Review SPEAKER_NOTES.md for timing"
+echo "  2. Open slides in PowerPoint/LibreOffice"
+echo "  3. Test one demo on Spaces"
+echo ""
+echo "🚀 Ready to present!"