vibhuiitj commited on
Commit
27f52a8
·
verified ·
1 Parent(s): 95c3887

Expand README with setup and deployment steps

Browse files
Files changed (1) hide show
  1. README.md +331 -14
README.md CHANGED
@@ -10,25 +10,95 @@ pinned: false
10
 
11
  # WhisperMath Web Demo
12
 
13
- Interactive local demo:
14
 
15
  ```text
16
- browser microphone -> faster-whisper transcript -> ByT5 math decoder output
 
 
 
17
  ```
18
 
19
- ## Setup
20
 
21
- Use the Phase 3 virtualenv or create a new one:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  ```bash
24
  cd /Users/vaibhav/Desktop/beyond/whispermath/webdemo
 
 
 
 
 
 
 
 
 
 
 
25
  python3 -m venv .venv
26
  source .venv/bin/activate
27
  python -m pip install --upgrade pip
28
  python -m pip install -r requirements.txt
29
  ```
30
 
31
- ## Run
 
 
32
 
33
  ```bash
34
  uvicorn app:app --host 127.0.0.1 --port 8766
@@ -40,7 +110,16 @@ Then open:
40
  http://127.0.0.1:8766
41
  ```
42
 
43
- ## Useful Environment Variables
 
 
 
 
 
 
 
 
 
44
 
45
  ```bash
46
  export WHISPERMATH_WHISPER_MODEL=small.en
@@ -50,26 +129,264 @@ export WHISPERMATH_DECODER_MODEL=vibhuiitj/byt5-base-whispermath-a100-checkpoint
50
  export WHISPERMATH_DECODER_DEVICE=auto
51
  ```
52
 
53
- For faster but weaker transcription:
 
 
 
 
 
 
 
 
 
54
 
55
  ```bash
56
- export WHISPERMATH_WHISPER_MODEL=tiny.en
 
 
57
  ```
58
 
59
- For decent quality with lower latency:
60
 
61
  ```bash
62
- export WHISPERMATH_WHISPER_MODEL=base.en
 
63
  ```
64
 
65
- For better but slower transcription:
 
 
66
 
67
  ```bash
68
- export WHISPERMATH_WHISPER_MODEL=small.en
69
  ```
70
 
71
- For local testing with medium Whisper:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
  ```bash
74
- export WHISPERMATH_WHISPER_MODEL=medium.en
 
 
 
75
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
  # WhisperMath Web Demo
12
 
13
+ WhisperMath is an interactive demo for converting spoken mathematical phrases into rendered math notation.
14
 
15
  ```text
16
+ browser microphone
17
+ -> faster-whisper transcript
18
+ -> ByT5 math decoder
19
+ -> rendered KaTeX output
20
  ```
21
 
22
+ The demo has two useful modes:
23
 
24
+ - **Record audio**: speak a math expression in the browser.
25
+ - **Edit transcript**: correct Whisper's transcript and click **Decode Transcript** to test only the ByT5 decoder.
26
+
27
+ This separation is important because spoken-math errors can come from two different places:
28
+
29
+ - Whisper may hear the audio incorrectly.
30
+ - The ByT5 decoder may convert a correct transcript incorrectly.
31
+
32
+ ## Live Demo
33
+
34
+ Hugging Face Space:
35
+
36
+ ```text
37
+ https://huggingface.co/spaces/vibhuiitj/whispermath-webdemo
38
+ ```
39
+
40
+ Direct app URL:
41
+
42
+ ```text
43
+ https://vibhuiitj-whispermath-webdemo.hf.space
44
+ ```
45
+
46
+ The public Space is configured for free CPU:
47
+
48
+ ```text
49
+ Whisper: small.en
50
+ Decoder: vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724
51
+ Device: CPU
52
+ ```
53
+
54
+ ## Files
55
+
56
+ ```text
57
+ webdemo/
58
+ app.py FastAPI backend
59
+ static/index.html Browser UI with recorder and KaTeX rendering
60
+ requirements.txt Python dependencies
61
+ Dockerfile Hugging Face Spaces Docker image
62
+ .dockerignore Files ignored by Docker build
63
+ README.md Space metadata and this guide
64
+ ```
65
+
66
+ ## How It Works
67
+
68
+ 1. The browser records audio using `MediaRecorder`.
69
+ 2. The frontend uploads the recording to `POST /api/transcribe`.
70
+ 3. The backend saves the audio to a temporary file.
71
+ 4. `faster-whisper` transcribes the audio into English text.
72
+ 5. The transcript is passed to the ByT5 checkpoint.
73
+ 6. The ByT5 output is returned as raw math/LaTeX-like text.
74
+ 7. The frontend renders the output using KaTeX and also shows the raw model output for debugging.
75
+
76
+ ## Local Setup
77
+
78
+ From this folder:
79
 
80
  ```bash
81
  cd /Users/vaibhav/Desktop/beyond/whispermath/webdemo
82
+ ```
83
+
84
+ You can use the existing Phase 3 virtualenv:
85
+
86
+ ```bash
87
+ /Users/vaibhav/Desktop/beyond/whispermath/phase-3-decoder/.venv/bin/python -m pip install -r requirements.txt
88
+ ```
89
+
90
+ Or create a fresh virtualenv:
91
+
92
+ ```bash
93
  python3 -m venv .venv
94
  source .venv/bin/activate
95
  python -m pip install --upgrade pip
96
  python -m pip install -r requirements.txt
97
  ```
98
 
99
+ ## Run Locally
100
+
101
+ CPU-friendly default:
102
 
103
  ```bash
104
  uvicorn app:app --host 127.0.0.1 --port 8766
 
110
  http://127.0.0.1:8766
111
  ```
112
 
113
+ If using the Phase 3 virtualenv directly:
114
+
115
+ ```bash
116
+ /Users/vaibhav/Desktop/beyond/whispermath/phase-3-decoder/.venv/bin/python \
117
+ -m uvicorn app:app --host 127.0.0.1 --port 8766
118
+ ```
119
+
120
+ ## Model Configuration
121
+
122
+ The app is controlled with environment variables.
123
 
124
  ```bash
125
  export WHISPERMATH_WHISPER_MODEL=small.en
 
129
  export WHISPERMATH_DECODER_DEVICE=auto
130
  ```
131
 
132
+ Whisper model options:
133
+
134
+ ```text
135
+ tiny.en Fastest, weakest transcription
136
+ base.en Better quality, still fairly light
137
+ small.en Good CPU default for the public Space
138
+ medium.en Better transcription, slower and heavier
139
+ ```
140
+
141
+ For local testing with medium Whisper:
142
 
143
  ```bash
144
+ WHISPERMATH_WHISPER_MODEL=medium.en \
145
+ /Users/vaibhav/Desktop/beyond/whispermath/phase-3-decoder/.venv/bin/python \
146
+ -m uvicorn app:app --host 127.0.0.1 --port 8766
147
  ```
148
 
149
+ For the free Hugging Face CPU Space, keep:
150
 
151
  ```bash
152
+ WHISPERMATH_WHISPER_MODEL=small.en
153
+ WHISPERMATH_DECODER_DEVICE=cpu
154
  ```
155
 
156
+ ## API Endpoints
157
+
158
+ ### Health
159
 
160
  ```bash
161
+ curl http://127.0.0.1:8766/api/health
162
  ```
163
 
164
+ Example:
165
+
166
+ ```json
167
+ {
168
+ "status": "ok",
169
+ "whisper_model": "small.en",
170
+ "decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724",
171
+ "decoder_device": "cpu"
172
+ }
173
+ ```
174
+
175
+ ### Decode Text Only
176
+
177
+ Use this when you want to test the ByT5 decoder without audio:
178
+
179
+ ```bash
180
+ curl -X POST http://127.0.0.1:8766/api/decode \
181
+ -H "Content-Type: application/json" \
182
+ -d '{"text":"x squared minus y squared equals four","num_beams":1,"max_new_tokens":128}'
183
+ ```
184
+
185
+ Example response:
186
+
187
+ ```json
188
+ {
189
+ "transcript": "x squared minus y squared equals four",
190
+ "math_text": "x^2-y^2=4",
191
+ "decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724"
192
+ }
193
+ ```
194
+
195
+ ### Transcribe Audio
196
 
197
  ```bash
198
+ curl -X POST http://127.0.0.1:8766/api/transcribe \
199
+ -F audio=@/path/to/audio.wav \
200
+ -F num_beams=4 \
201
+ -F max_new_tokens=256
202
  ```
203
+
204
+ Example response:
205
+
206
+ ```json
207
+ {
208
+ "transcript": "integral from zero to pi of sine x dx.",
209
+ "math_text": "\\int_0^\\pi \\sin x dx.",
210
+ "segments": [
211
+ {
212
+ "start": 0.0,
213
+ "end": 2.72,
214
+ "text": "integral from zero to pi of sine x dx."
215
+ }
216
+ ],
217
+ "whisper_model": "small.en",
218
+ "decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724"
219
+ }
220
+ ```
221
+
222
+ ## Deploy To Hugging Face Spaces
223
+
224
+ This folder is ready for a Docker Space.
225
+
226
+ ### 1. Login
227
+
228
+ Do not commit or paste tokens into files.
229
+
230
+ ```bash
231
+ huggingface-cli login
232
+ ```
233
+
234
+ Or use the Python API with your local authenticated session.
235
+
236
+ ### 2. Create The Space
237
+
238
+ ```python
239
+ from huggingface_hub import HfApi
240
+
241
+ api = HfApi()
242
+ api.create_repo(
243
+ repo_id="vibhuiitj/whispermath-webdemo",
244
+ repo_type="space",
245
+ space_sdk="docker",
246
+ private=False,
247
+ exist_ok=True,
248
+ )
249
+ ```
250
+
251
+ ### 3. Upload The Folder
252
+
253
+ ```python
254
+ from huggingface_hub import HfApi
255
+
256
+ api = HfApi()
257
+ api.upload_folder(
258
+ repo_id="vibhuiitj/whispermath-webdemo",
259
+ repo_type="space",
260
+ folder_path="/Users/vaibhav/Desktop/beyond/whispermath/webdemo",
261
+ path_in_repo=".",
262
+ ignore_patterns=[
263
+ "__pycache__/*",
264
+ "*.pyc",
265
+ ".DS_Store",
266
+ ".venv/*",
267
+ "audio/*",
268
+ "outputs/*",
269
+ ],
270
+ commit_message="Deploy WhisperMath web demo",
271
+ )
272
+ ```
273
+
274
+ ### 4. Check Runtime
275
+
276
+ ```python
277
+ from huggingface_hub import HfApi
278
+
279
+ runtime = HfApi().get_space_runtime("vibhuiitj/whispermath-webdemo")
280
+ print(runtime)
281
+ ```
282
+
283
+ Expected final state:
284
+
285
+ ```text
286
+ stage='RUNNING'
287
+ hardware='cpu-basic'
288
+ requested_hardware='cpu-basic'
289
+ ```
290
+
291
+ ### 5. Test The Deployed Space
292
+
293
+ Health:
294
+
295
+ ```bash
296
+ curl https://vibhuiitj-whispermath-webdemo.hf.space/api/health
297
+ ```
298
+
299
+ Text decode:
300
+
301
+ ```bash
302
+ curl -L -X POST https://vibhuiitj-whispermath-webdemo.hf.space/api/decode \
303
+ -H "Content-Type: application/json" \
304
+ -d '{"text":"x squared minus y squared equals four","num_beams":1,"max_new_tokens":128}'
305
+ ```
306
+
307
+ Expected:
308
+
309
+ ```json
310
+ {
311
+ "transcript": "x squared minus y squared equals four",
312
+ "math_text": "x^2-y^2=4",
313
+ "decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724"
314
+ }
315
+ ```
316
+
317
+ ## Docker Notes
318
+
319
+ The Docker image:
320
+
321
+ - Uses `python:3.11-slim`
322
+ - Installs `libgomp1`, needed by some CPU inference dependencies
323
+ - Runs on port `7860`, the Hugging Face Spaces default
324
+ - Sets `WHISPERMATH_WHISPER_MODEL=small.en`
325
+ - Sets `WHISPERMATH_DECODER_DEVICE=cpu`
326
+ - Sets `HF_HUB_DISABLE_XET=1`
327
+
328
+ `HF_HUB_DISABLE_XET=1` is included because local testing showed large model downloads could get stuck with incomplete Xet-backed cache files.
329
+
330
+ ## Troubleshooting
331
+
332
+ ### Space Takes A Long Time To Start
333
+
334
+ The first start downloads:
335
+
336
+ - `Systran/faster-whisper-small.en`
337
+ - `vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724`
338
+
339
+ On free CPU, startup can take a few minutes.
340
+
341
+ ### Audio Transcription Is Wrong
342
+
343
+ Try these in order:
344
+
345
+ 1. Speak shorter phrases.
346
+ 2. Use clearer operator words, for example `over` instead of `by`.
347
+ 3. Check the editable transcript box.
348
+ 4. Correct the transcript manually.
349
+ 5. Click **Decode Transcript**.
350
+ 6. Try `base.en`, `small.en`, or `medium.en` locally.
351
+
352
+ ### ByT5 Output Is Wrong But Transcript Is Correct
353
+
354
+ That means the decoder needs more targeted training data. Common weak phrases include:
355
+
356
+ ```text
357
+ by / divided by / over
358
+ whole square
359
+ derivative of ...
360
+ limit as ...
361
+ fraction with grouped numerator and denominator
362
+ ```
363
+
364
+ Use the editable transcript box to collect failure cases.
365
+
366
+ ### KaTeX Rendering Fails
367
+
368
+ The app still shows the raw ByT5 output under **Raw ByT5 Output**. If the raw output is malformed LaTeX-like text, KaTeX may render an error-colored expression or show fallback text.
369
+
370
+ ### Medium Whisper Hangs During Download
371
+
372
+ If a local download leaves an incomplete cache file, run:
373
+
374
+ ```bash
375
+ HF_HUB_DISABLE_XET=1 python - <<'PY'
376
+ from huggingface_hub import snapshot_download
377
+ snapshot_download("Systran/faster-whisper-medium.en", max_workers=1)
378
+ PY
379
+ ```
380
+
381
+ Then restart:
382
+
383
+ ```bash
384
+ HF_HUB_DISABLE_XET=1 WHISPERMATH_WHISPER_MODEL=medium.en \
385
+ python -m uvicorn app:app --host 127.0.0.1 --port 8766
386
+ ```
387
+
388
+ ## Security
389
+
390
+ Never commit Hugging Face tokens or API keys into this folder.
391
+
392
+ If a token is pasted into a chat or terminal history by mistake, revoke/rotate it from Hugging Face settings.