File size: 1,544 Bytes
ebd182e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | ---
title: VoiceVerse AI
emoji: ποΈ
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: "5.23.1"
python_version: "3.10"
app_file: app.py
pinned: false
---
# ποΈ VoiceVerse AI β Document to Audio
Transform uploaded documents into engaging, emotionally expressive podcast-style audio narrations.
## Pipeline
```
PDF/TXT β Text Extraction β RAG (chunk + embed + retrieve) β Script Generation (Mistral-7B) β TTS (Qwen3-TTS / Edge-TTS) β Audio Playback
```
## Models Used
| Component | Model | How |
|-----------|-------|-----|
| Embeddings | `all-MiniLM-L6-v2` | Local (CPU) |
| Script Gen | `Mistral-7B-Instruct-v0.3` | HF Inference API |
| TTS (primary) | `Qwen3-TTS` | HF Inference API |
| TTS (fallback) | `Edge-TTS (AriaNeural)` | Local (CPU) |
## Setup
```bash
pip install -r requirements.txt
export HF_TOKEN="your_huggingface_token_here"
python app.py
```
## Deployment on HF Spaces
1. Create a new Space (Gradio SDK)
2. Upload all project files
3. Set `HF_TOKEN` as a Space Secret
4. The app will auto-launch on port 7860
## Project Structure
```
app.py # Gradio UI entry point
rag.py # Document ingestion, chunking, embedding, retrieval
script_gen.py # LLM script generation (Mistral-7B-Instruct)
tts.py # Text-to-speech (Qwen3-TTS + Edge-TTS fallback)
utils.py # Helpers (temp files, validation, error formatting)
requirements.txt # Python dependencies
packages.txt # System packages (ffmpeg)
```
|