File size: 1,544 Bytes
3828c7d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---

title: VoiceVerse AI
emoji: πŸŽ™οΈ
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: "5.23.1"
python_version: "3.10"
app_file: app.py
pinned: false
---


# πŸŽ™οΈ VoiceVerse AI β€” Document to Audio

Transform uploaded documents into engaging, emotionally expressive podcast-style audio narrations.

## Pipeline

```

PDF/TXT β†’ Text Extraction β†’ RAG (chunk + embed + retrieve) β†’ Script Generation (Mistral-7B) β†’ TTS (Qwen3-TTS / Edge-TTS) β†’ Audio Playback

```

## Models Used

| Component | Model | How |
|-----------|-------|-----|
| Embeddings | `all-MiniLM-L6-v2` | Local (CPU) |
| Script Gen | `Mistral-7B-Instruct-v0.3` | HF Inference API |
| TTS (primary) | `Qwen3-TTS` | HF Inference API |
| TTS (fallback) | `Edge-TTS (AriaNeural)` | Local (CPU) |

## Setup

```bash

pip install -r requirements.txt

export HF_TOKEN="your_huggingface_token_here"

python app.py

```

## Deployment on HF Spaces

1. Create a new Space (Gradio SDK)
2. Upload all project files
3. Set `HF_TOKEN` as a Space Secret
4. The app will auto-launch on port 7860

## Project Structure

```

app.py           # Gradio UI entry point

rag.py           # Document ingestion, chunking, embedding, retrieval

script_gen.py    # LLM script generation (Mistral-7B-Instruct)

tts.py           # Text-to-speech (Qwen3-TTS + Edge-TTS fallback)

utils.py         # Helpers (temp files, validation, error formatting)

requirements.txt # Python dependencies

packages.txt     # System packages (ffmpeg)

```