PatnaikAshish commited on
Commit
240687c
·
verified ·
1 Parent(s): 1e2b6b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +205 -0
README.md CHANGED
@@ -13,3 +13,208 @@ short_description: Kokoro, But It Clones Voices Now
13
  ---
14
 
15
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
16
+ # KokoClone
17
+
18
+ [![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Live%20Demo-blue)](https://huggingface.co/spaces/PatnaikAshish/kokoclone)
19
+ [![Hugging Face Models](https://img.shields.io/badge/🤗%20Models-Repository-orange)](https://huggingface.co/PatnaikAshish/kokoclone)
20
+ [![Python](https://img.shields.io/badge/Python-3.10+-3776AB.svg?logo=python\&logoColor=white)]
21
+ [![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://opensource.org/licenses/Apache-2.0)
22
+
23
+
24
+
25
+ ## What is KokoClone?
26
+
27
+ **KokoClone** is a fast, real-time compatible multilingual voice cloning system built on top of **Kokoro-ONNX**, one of the fastest open-source neural TTS engines available today.
28
+
29
+ It allows you to:
30
+
31
+ * Type text in multiple languages
32
+ * Provide a short 3–10 second reference audio clip
33
+ * Instantly generate speech in that same voice
34
+
35
+
36
+ Just text → voice → cloned output.
37
+
38
+
39
+ ## Why Kokoro?
40
+
41
+ KokoClone is powered by **Kokoro-ONNX**, a highly optimized neural TTS engine designed for:
42
+
43
+ * Extremely fast inference
44
+ * Natural prosody and expressive speech
45
+ * Lightweight ONNX runtime compatibility
46
+ * Real-time deployment on CPU
47
+ * Even faster performance with GPU
48
+
49
+ Unlike many heavy TTS systems, Kokoro is lightweight and responsive — making KokoClone suitable for real-time applications, voice assistants, demos, and interactive tools.
50
+
51
+
52
+ ## Features
53
+
54
+ ### Multilingual Speech Generation
55
+
56
+ Generate native speech in:
57
+
58
+ * English (`en`)
59
+ * Hindi (`hi`)
60
+ * French (`fr`)
61
+ * Japanese (`ja`)
62
+ * Chinese (`zh`)
63
+ * Italian (`it`)
64
+ * Portuguese (`pt`)
65
+ * Spanish (`es`)
66
+
67
+
68
+ ### Zero-Shot Voice Cloning
69
+
70
+ Upload a short voice sample and KokoClone transfers its vocal characteristics to the generated speech.
71
+
72
+
73
+ ### Real-Time Friendly
74
+
75
+ Built on Kokoro’s efficient ONNX runtime pipeline, KokoClone runs smoothly on:
76
+
77
+ * Standard laptops (CPU)
78
+ * Workstations (GPU)
79
+
80
+
81
+ ### Automatic Model Handling
82
+
83
+ On first run, required model files are downloaded automatically and placed in the correct directories.
84
+
85
+
86
+ ### Built-in Web Interface
87
+
88
+ Includes a clean and responsive Gradio UI for quick testing and demos.
89
+
90
+
91
+
92
+ ## Live Demo
93
+
94
+ Try it instantly without installing anything:
95
+
96
+ 👉 **[KokoClone on Hugging Face Spaces](https://huggingface.co/spaces/PatnaikAshish/kokoclone)**
97
+
98
+
99
+
100
+ ## Installation
101
+
102
+ Recommended: Use `conda` for a clean environment.
103
+
104
+ ### Clone the Repository
105
+
106
+ ```bash
107
+ git clone https://github.com/Ashish-Patnaik/kokoclone.git
108
+ cd kokoclone
109
+ ```
110
+
111
+ ### Create Environment
112
+
113
+ ```bash
114
+ conda create -n kokoclone python=3.12.12 -y
115
+ conda activate kokoclone
116
+ ```
117
+
118
+
119
+
120
+ ## Install Dependencies
121
+
122
+ ### CPU Installation (Recommended for most users)
123
+
124
+ ```bash
125
+ pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
126
+ pip install -r requirements.txt
127
+ ```
128
+
129
+ ### GPU Installation (NVIDIA users)
130
+
131
+ ```bash
132
+ pip install -r requirements.txt
133
+ pip install kokoro-onnx[gpu]
134
+ ```
135
+
136
+
137
+
138
+ ## Usage
139
+
140
+ KokoClone can be used in three ways:
141
+
142
+
143
+
144
+ ### Web Interface
145
+
146
+ Launch the Gradio app:
147
+
148
+ ```bash
149
+ python app.py
150
+ ```
151
+
152
+ Then open the browser interface to:
153
+
154
+ * Enter text
155
+ * Select language
156
+ * Upload a reference voice
157
+ * Generate cloned speech
158
+
159
+
160
+
161
+ ### Command Line
162
+
163
+ ```bash
164
+ python cli.py --text "Hello from KokoClone" --lang en --ref reference.wav --out output.wav
165
+ ```
166
+
167
+
168
+
169
+ ### Python API
170
+
171
+ ```python
172
+ from core.cloner import KokoClone
173
+
174
+ cloner = KokoClone()
175
+
176
+ cloner.generate(
177
+ text="This voice is cloned using KokoClone.",
178
+ lang="en",
179
+ reference_audio="reference.wav",
180
+ output_path="output.wav"
181
+ )
182
+ ```
183
+
184
+
185
+
186
+ ## Project Structure
187
+
188
+ ```
189
+ app.py → Gradio Web Interface
190
+ cli.py → Command-line tool
191
+ core/cloner.py → Core inference engine
192
+ inference.py → Example usage script
193
+ model/ → Downloaded TTS model weights
194
+ voice/ → Voice embeddings
195
+ ```
196
+
197
+
198
+
199
+ ## Use Cases
200
+
201
+ * Voice assistant prototypes
202
+ * Real-time TTS demos
203
+ * Multilingual narration tools
204
+ * Content creation
205
+ * Research experiments
206
+ * Interactive AI applications
207
+
208
+
209
+
210
+ ## Acknowledgments
211
+
212
+ This project builds upon:
213
+
214
+ * **Kokoro-ONNX** — for fast and efficient neural speech synthesis
215
+ * **Kanade Tokenizer** — for voice conversion architecture
216
+
217
+
218
+ ## License
219
+
220
+ Licensed under the Apache 2.0 License.