| ---
|
| title: VoiceVerse AI
|
| emoji: ποΈ
|
| colorFrom: indigo
|
| colorTo: purple
|
| sdk: gradio
|
| sdk_version: "5.23.1"
|
| python_version: "3.10"
|
| app_file: app.py
|
| pinned: false
|
| ---
|
|
|
| # ποΈ VoiceVerse AI β Document to Audio
|
|
|
| Transform uploaded documents into engaging, emotionally expressive podcast-style audio narrations.
|
|
|
| ## Pipeline
|
|
|
| ```
|
| PDF/TXT β Text Extraction β RAG (chunk + embed + retrieve) β Script Generation (Mistral-7B) β TTS (Qwen3-TTS / Edge-TTS) β Audio Playback
|
| ```
|
|
|
| ## Models Used
|
|
|
| | Component | Model | How |
|
| |-----------|-------|-----|
|
| | Embeddings | `all-MiniLM-L6-v2` | Local (CPU) |
|
| | Script Gen | `Mistral-7B-Instruct-v0.3` | HF Inference API |
|
| | TTS (primary) | `Qwen3-TTS` | HF Inference API |
|
| | TTS (fallback) | `Edge-TTS (AriaNeural)` | Local (CPU) |
|
|
|
| ## Setup
|
|
|
| ```bash
|
| pip install -r requirements.txt
|
| export HF_TOKEN="your_huggingface_token_here"
|
| python app.py
|
| ```
|
|
|
| ## Deployment on HF Spaces
|
|
|
| 1. Create a new Space (Gradio SDK)
|
| 2. Upload all project files
|
| 3. Set `HF_TOKEN` as a Space Secret
|
| 4. The app will auto-launch on port 7860
|
|
|
| ## Project Structure
|
|
|
| ```
|
| app.py # Gradio UI entry point
|
| rag.py # Document ingestion, chunking, embedding, retrieval
|
| script_gen.py # LLM script generation (Mistral-7B-Instruct)
|
| tts.py # Text-to-speech (Qwen3-TTS + Edge-TTS fallback)
|
| utils.py # Helpers (temp files, validation, error formatting)
|
| requirements.txt # Python dependencies
|
| packages.txt # System packages (ffmpeg)
|
| ```
|
|
|