Spaces:

Isshi14
/

CHECK

No application file

CHECK / README.md

Upload 12 files

ebd182e verified 3 months ago

1.54 kB

	---
	title: VoiceVerse AI
	emoji: 🎙️
	colorFrom: indigo
	colorTo: purple
	sdk: gradio
	sdk_version: "5.23.1"
	python_version: "3.10"
	app_file: app.py
	pinned: false
	---

	# 🎙️ VoiceVerse AI — Document to Audio

	Transform uploaded documents into engaging, emotionally expressive podcast-style audio narrations.

	## Pipeline

	```
	PDF/TXT → Text Extraction → RAG (chunk + embed + retrieve) → Script Generation (Mistral-7B) → TTS (Qwen3-TTS / Edge-TTS) → Audio Playback
	```

	## Models Used

	\| Component \| Model \| How \|
	\|-----------\|-------\|-----\|
	\| Embeddings \| `all-MiniLM-L6-v2` \| Local (CPU) \|
	\| Script Gen \| `Mistral-7B-Instruct-v0.3` \| HF Inference API \|
	\| TTS (primary) \| `Qwen3-TTS` \| HF Inference API \|
	\| TTS (fallback) \| `Edge-TTS (AriaNeural)` \| Local (CPU) \|

	## Setup

	```bash
	pip install -r requirements.txt
	export HF_TOKEN="your_huggingface_token_here"
	python app.py
	```

	## Deployment on HF Spaces

	1. Create a new Space (Gradio SDK)
	2. Upload all project files
	3. Set `HF_TOKEN` as a Space Secret
	4. The app will auto-launch on port 7860

	## Project Structure

	```
	app.py # Gradio UI entry point
	rag.py # Document ingestion, chunking, embedding, retrieval
	script_gen.py # LLM script generation (Mistral-7B-Instruct)
	tts.py # Text-to-speech (Qwen3-TTS + Edge-TTS fallback)
	utils.py # Helpers (temp files, validation, error formatting)
	requirements.txt # Python dependencies
	packages.txt # System packages (ffmpeg)
	```