--- title: WhisperMath emoji: 🧮 colorFrom: green colorTo: blue sdk: docker app_port: 7860 pinned: false --- # WhisperMath Web Demo WhisperMath is an interactive demo for converting spoken mathematical phrases into rendered math notation. ```text browser microphone -> faster-whisper transcript -> ByT5 math decoder -> rendered KaTeX output ``` The demo has two useful modes: - **Record audio**: speak a math expression in the browser. - **Edit transcript**: correct Whisper's transcript and click **Decode Transcript** to test only the ByT5 decoder. This separation is important because spoken-math errors can come from two different places: - Whisper may hear the audio incorrectly. - The ByT5 decoder may convert a correct transcript incorrectly. ## Live Demo Hugging Face Space: ```text https://huggingface.co/spaces/vibhuiitj/whispermath-webdemo ``` Direct app URL: ```text https://vibhuiitj-whispermath-webdemo.hf.space ``` The public Space is configured for free CPU: ```text Whisper: small.en Decoder: vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724 Device: CPU ``` ## Files ```text webdemo/ app.py FastAPI backend static/index.html Browser UI with recorder and KaTeX rendering requirements.txt Python dependencies Dockerfile Hugging Face Spaces Docker image .dockerignore Files ignored by Docker build README.md Space metadata and this guide ``` ## How It Works 1. The browser records audio using `MediaRecorder`. 2. The frontend uploads the recording to `POST /api/transcribe`. 3. The backend saves the audio to a temporary file. 4. `faster-whisper` transcribes the audio into English text. 5. The transcript is passed to the ByT5 checkpoint. 6. The ByT5 output is returned as raw math/LaTeX-like text. 7. The frontend renders the output using KaTeX and also shows the raw model output for debugging. ## Local Setup From this folder: ```bash cd /Users/vaibhav/Desktop/beyond/whispermath/webdemo ``` You can use the existing Phase 3 virtualenv: ```bash /Users/vaibhav/Desktop/beyond/whispermath/phase-3-decoder/.venv/bin/python -m pip install -r requirements.txt ``` Or create a fresh virtualenv: ```bash python3 -m venv .venv source .venv/bin/activate python -m pip install --upgrade pip python -m pip install -r requirements.txt ``` ## Run Locally CPU-friendly default: ```bash uvicorn app:app --host 127.0.0.1 --port 8766 ``` Then open: ```text http://127.0.0.1:8766 ``` If using the Phase 3 virtualenv directly: ```bash /Users/vaibhav/Desktop/beyond/whispermath/phase-3-decoder/.venv/bin/python \ -m uvicorn app:app --host 127.0.0.1 --port 8766 ``` ## Model Configuration The app is controlled with environment variables. ```bash export WHISPERMATH_WHISPER_MODEL=small.en export WHISPERMATH_WHISPER_DEVICE=cpu export WHISPERMATH_WHISPER_COMPUTE_TYPE=int8 export WHISPERMATH_DECODER_MODEL=vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724 export WHISPERMATH_DECODER_DEVICE=auto ``` Whisper model options: ```text tiny.en Fastest, weakest transcription base.en Better quality, still fairly light small.en Good CPU default for the public Space medium.en Better transcription, slower and heavier ``` For local testing with medium Whisper: ```bash WHISPERMATH_WHISPER_MODEL=medium.en \ /Users/vaibhav/Desktop/beyond/whispermath/phase-3-decoder/.venv/bin/python \ -m uvicorn app:app --host 127.0.0.1 --port 8766 ``` For the free Hugging Face CPU Space, keep: ```bash WHISPERMATH_WHISPER_MODEL=small.en WHISPERMATH_DECODER_DEVICE=cpu ``` ## API Endpoints ### Health ```bash curl http://127.0.0.1:8766/api/health ``` Example: ```json { "status": "ok", "whisper_model": "small.en", "decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724", "decoder_device": "cpu" } ``` ### Decode Text Only Use this when you want to test the ByT5 decoder without audio: ```bash curl -X POST http://127.0.0.1:8766/api/decode \ -H "Content-Type: application/json" \ -d '{"text":"x squared minus y squared equals four","num_beams":1,"max_new_tokens":128}' ``` Example response: ```json { "transcript": "x squared minus y squared equals four", "math_text": "x^2-y^2=4", "decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724" } ``` ### Transcribe Audio ```bash curl -X POST http://127.0.0.1:8766/api/transcribe \ -F audio=@/path/to/audio.wav \ -F num_beams=4 \ -F max_new_tokens=256 ``` Example response: ```json { "transcript": "integral from zero to pi of sine x dx.", "math_text": "\\int_0^\\pi \\sin x dx.", "segments": [ { "start": 0.0, "end": 2.72, "text": "integral from zero to pi of sine x dx." } ], "whisper_model": "small.en", "decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724" } ``` ## Deploy To Hugging Face Spaces This folder is ready for a Docker Space. ### 1. Login Do not commit or paste tokens into files. ```bash huggingface-cli login ``` Or use the Python API with your local authenticated session. ### 2. Create The Space ```python from huggingface_hub import HfApi api = HfApi() api.create_repo( repo_id="vibhuiitj/whispermath-webdemo", repo_type="space", space_sdk="docker", private=False, exist_ok=True, ) ``` ### 3. Upload The Folder ```python from huggingface_hub import HfApi api = HfApi() api.upload_folder( repo_id="vibhuiitj/whispermath-webdemo", repo_type="space", folder_path="/Users/vaibhav/Desktop/beyond/whispermath/webdemo", path_in_repo=".", ignore_patterns=[ "__pycache__/*", "*.pyc", ".DS_Store", ".venv/*", "audio/*", "outputs/*", ], commit_message="Deploy WhisperMath web demo", ) ``` ### 4. Check Runtime ```python from huggingface_hub import HfApi runtime = HfApi().get_space_runtime("vibhuiitj/whispermath-webdemo") print(runtime) ``` Expected final state: ```text stage='RUNNING' hardware='cpu-basic' requested_hardware='cpu-basic' ``` ### 5. Test The Deployed Space Health: ```bash curl https://vibhuiitj-whispermath-webdemo.hf.space/api/health ``` Text decode: ```bash curl -L -X POST https://vibhuiitj-whispermath-webdemo.hf.space/api/decode \ -H "Content-Type: application/json" \ -d '{"text":"x squared minus y squared equals four","num_beams":1,"max_new_tokens":128}' ``` Expected: ```json { "transcript": "x squared minus y squared equals four", "math_text": "x^2-y^2=4", "decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724" } ``` ## Docker Notes The Docker image: - Uses `python:3.11-slim` - Installs `libgomp1`, needed by some CPU inference dependencies - Runs on port `7860`, the Hugging Face Spaces default - Sets `WHISPERMATH_WHISPER_MODEL=small.en` - Sets `WHISPERMATH_DECODER_DEVICE=cpu` - Sets `HF_HUB_DISABLE_XET=1` `HF_HUB_DISABLE_XET=1` is included because local testing showed large model downloads could get stuck with incomplete Xet-backed cache files. ## Troubleshooting ### Space Takes A Long Time To Start The first start downloads: - `Systran/faster-whisper-small.en` - `vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724` On free CPU, startup can take a few minutes. ### Audio Transcription Is Wrong Try these in order: 1. Speak shorter phrases. 2. Use clearer operator words, for example `over` instead of `by`. 3. Check the editable transcript box. 4. Correct the transcript manually. 5. Click **Decode Transcript**. 6. Try `base.en`, `small.en`, or `medium.en` locally. ### ByT5 Output Is Wrong But Transcript Is Correct That means the decoder needs more targeted training data. Common weak phrases include: ```text by / divided by / over whole square derivative of ... limit as ... fraction with grouped numerator and denominator ``` Use the editable transcript box to collect failure cases. ### KaTeX Rendering Fails The app still shows the raw ByT5 output under **Raw ByT5 Output**. If the raw output is malformed LaTeX-like text, KaTeX may render an error-colored expression or show fallback text. ### Medium Whisper Hangs During Download If a local download leaves an incomplete cache file, run: ```bash HF_HUB_DISABLE_XET=1 python - <<'PY' from huggingface_hub import snapshot_download snapshot_download("Systran/faster-whisper-medium.en", max_workers=1) PY ``` Then restart: ```bash HF_HUB_DISABLE_XET=1 WHISPERMATH_WHISPER_MODEL=medium.en \ python -m uvicorn app:app --host 127.0.0.1 --port 8766 ``` ## Security Never commit Hugging Face tokens or API keys into this folder. If a token is pasted into a chat or terminal history by mistake, revoke/rotate it from Hugging Face settings.