Spaces:

vibhuiitj
/

whispermath-webdemo

Sleeping

App Files Files Community

whispermath-webdemo / README.md

vibhuiitj

Expand README with setup and deployment steps

27f52a8 verified 16 days ago

preview code

raw

history blame contribute delete

8.69 kB

	---
	title: WhisperMath
	emoji: 🧮
	colorFrom: green
	colorTo: blue
	sdk: docker
	app_port: 7860
	pinned: false
	---

	# WhisperMath Web Demo

	WhisperMath is an interactive demo for converting spoken mathematical phrases into rendered math notation.

	```text
	browser microphone
	-> faster-whisper transcript
	-> ByT5 math decoder
	-> rendered KaTeX output
	```

	The demo has two useful modes:

	- Record audio: speak a math expression in the browser.
	- Edit transcript: correct Whisper's transcript and click Decode Transcript to test only the ByT5 decoder.

	This separation is important because spoken-math errors can come from two different places:

	- Whisper may hear the audio incorrectly.
	- The ByT5 decoder may convert a correct transcript incorrectly.

	## Live Demo

	Hugging Face Space:

	```text
	https://huggingface.co/spaces/vibhuiitj/whispermath-webdemo
	```

	Direct app URL:

	```text
	https://vibhuiitj-whispermath-webdemo.hf.space
	```

	The public Space is configured for free CPU:

	```text
	Whisper: small.en
	Decoder: vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724
	Device: CPU
	```

	## Files

	```text
	webdemo/
	app.py FastAPI backend
	static/index.html Browser UI with recorder and KaTeX rendering
	requirements.txt Python dependencies
	Dockerfile Hugging Face Spaces Docker image
	.dockerignore Files ignored by Docker build
	README.md Space metadata and this guide
	```

	## How It Works

	1. The browser records audio using `MediaRecorder`.
	2. The frontend uploads the recording to `POST /api/transcribe`.
	3. The backend saves the audio to a temporary file.
	4. `faster-whisper` transcribes the audio into English text.
	5. The transcript is passed to the ByT5 checkpoint.
	6. The ByT5 output is returned as raw math/LaTeX-like text.
	7. The frontend renders the output using KaTeX and also shows the raw model output for debugging.

	## Local Setup

	From this folder:

	```bash
	cd /Users/vaibhav/Desktop/beyond/whispermath/webdemo
	```

	You can use the existing Phase 3 virtualenv:

	```bash
	/Users/vaibhav/Desktop/beyond/whispermath/phase-3-decoder/.venv/bin/python -m pip install -r requirements.txt
	```

	Or create a fresh virtualenv:

	```bash
	python3 -m venv .venv
	source .venv/bin/activate
	python -m pip install --upgrade pip
	python -m pip install -r requirements.txt
	```

	## Run Locally

	CPU-friendly default:

	```bash
	uvicorn app:app --host 127.0.0.1 --port 8766
	```

	Then open:

	```text
	http://127.0.0.1:8766
	```

	If using the Phase 3 virtualenv directly:

	```bash
	/Users/vaibhav/Desktop/beyond/whispermath/phase-3-decoder/.venv/bin/python \
	-m uvicorn app:app --host 127.0.0.1 --port 8766
	```

	## Model Configuration

	The app is controlled with environment variables.

	```bash
	export WHISPERMATH_WHISPER_MODEL=small.en
	export WHISPERMATH_WHISPER_DEVICE=cpu
	export WHISPERMATH_WHISPER_COMPUTE_TYPE=int8
	export WHISPERMATH_DECODER_MODEL=vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724
	export WHISPERMATH_DECODER_DEVICE=auto
	```

	Whisper model options:

	```text
	tiny.en Fastest, weakest transcription
	base.en Better quality, still fairly light
	small.en Good CPU default for the public Space
	medium.en Better transcription, slower and heavier
	```

	For local testing with medium Whisper:

	```bash
	WHISPERMATH_WHISPER_MODEL=medium.en \
	/Users/vaibhav/Desktop/beyond/whispermath/phase-3-decoder/.venv/bin/python \
	-m uvicorn app:app --host 127.0.0.1 --port 8766
	```

	For the free Hugging Face CPU Space, keep:

	```bash
	WHISPERMATH_WHISPER_MODEL=small.en
	WHISPERMATH_DECODER_DEVICE=cpu
	```

	## API Endpoints

	### Health

	```bash
	curl http://127.0.0.1:8766/api/health
	```

	Example:

	```json
	{
	"status": "ok",
	"whisper_model": "small.en",
	"decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724",
	"decoder_device": "cpu"
	}
	```

	### Decode Text Only

	Use this when you want to test the ByT5 decoder without audio:

	```bash
	curl -X POST http://127.0.0.1:8766/api/decode \
	-H "Content-Type: application/json" \
	-d '{"text":"x squared minus y squared equals four","num_beams":1,"max_new_tokens":128}'
	```

	Example response:

	```json
	{
	"transcript": "x squared minus y squared equals four",
	"math_text": "x^2-y^2=4",
	"decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724"
	}
	```

	### Transcribe Audio

	```bash
	curl -X POST http://127.0.0.1:8766/api/transcribe \
	-F audio=@/path/to/audio.wav \
	-F num_beams=4 \
	-F max_new_tokens=256
	```

	Example response:

	```json
	{
	"transcript": "integral from zero to pi of sine x dx.",
	"math_text": "\\int_0^\\pi \\sin x dx.",
	"segments": [
	{
	"start": 0.0,
	"end": 2.72,
	"text": "integral from zero to pi of sine x dx."
	}
	],
	"whisper_model": "small.en",
	"decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724"
	}
	```

	## Deploy To Hugging Face Spaces

	This folder is ready for a Docker Space.

	### 1. Login

	Do not commit or paste tokens into files.

	```bash
	huggingface-cli login
	```

	Or use the Python API with your local authenticated session.

	### 2. Create The Space

	```python
	from huggingface_hub import HfApi

	api = HfApi()
	api.create_repo(
	repo_id="vibhuiitj/whispermath-webdemo",
	repo_type="space",
	space_sdk="docker",
	private=False,
	exist_ok=True,
	)
	```

	### 3. Upload The Folder

	```python
	from huggingface_hub import HfApi

	api = HfApi()
	api.upload_folder(
	repo_id="vibhuiitj/whispermath-webdemo",
	repo_type="space",
	folder_path="/Users/vaibhav/Desktop/beyond/whispermath/webdemo",
	path_in_repo=".",
	ignore_patterns=[
	"__pycache__/*",
	"*.pyc",
	".DS_Store",
	".venv/*",
	"audio/*",
	"outputs/*",
	],
	commit_message="Deploy WhisperMath web demo",
	)
	```

	### 4. Check Runtime

	```python
	from huggingface_hub import HfApi

	runtime = HfApi().get_space_runtime("vibhuiitj/whispermath-webdemo")
	print(runtime)
	```

	Expected final state:

	```text
	stage='RUNNING'
	hardware='cpu-basic'
	requested_hardware='cpu-basic'
	```

	### 5. Test The Deployed Space

	Health:

	```bash
	curl https://vibhuiitj-whispermath-webdemo.hf.space/api/health
	```

	Text decode:

	```bash
	curl -L -X POST https://vibhuiitj-whispermath-webdemo.hf.space/api/decode \
	-H "Content-Type: application/json" \
	-d '{"text":"x squared minus y squared equals four","num_beams":1,"max_new_tokens":128}'
	```

	Expected:

	```json
	{
	"transcript": "x squared minus y squared equals four",
	"math_text": "x^2-y^2=4",
	"decoder_model": "vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724"
	}
	```

	## Docker Notes

	The Docker image:

	- Uses `python:3.11-slim`
	- Installs `libgomp1`, needed by some CPU inference dependencies
	- Runs on port `7860`, the Hugging Face Spaces default
	- Sets `WHISPERMATH_WHISPER_MODEL=small.en`
	- Sets `WHISPERMATH_DECODER_DEVICE=cpu`
	- Sets `HF_HUB_DISABLE_XET=1`

	`HF_HUB_DISABLE_XET=1` is included because local testing showed large model downloads could get stuck with incomplete Xet-backed cache files.

	## Troubleshooting

	### Space Takes A Long Time To Start

	The first start downloads:

	- `Systran/faster-whisper-small.en`
	- `vibhuiitj/byt5-base-whispermath-a100-checkpoint-10724`

	On free CPU, startup can take a few minutes.

	### Audio Transcription Is Wrong

	Try these in order:

	1. Speak shorter phrases.
	2. Use clearer operator words, for example `over` instead of `by`.
	3. Check the editable transcript box.
	4. Correct the transcript manually.
	5. Click Decode Transcript.
	6. Try `base.en`, `small.en`, or `medium.en` locally.

	### ByT5 Output Is Wrong But Transcript Is Correct

	That means the decoder needs more targeted training data. Common weak phrases include:

	```text
	by / divided by / over
	whole square
	derivative of ...
	limit as ...
	fraction with grouped numerator and denominator
	```

	Use the editable transcript box to collect failure cases.

	### KaTeX Rendering Fails

	The app still shows the raw ByT5 output under Raw ByT5 Output. If the raw output is malformed LaTeX-like text, KaTeX may render an error-colored expression or show fallback text.

	### Medium Whisper Hangs During Download

	If a local download leaves an incomplete cache file, run:

	```bash
	HF_HUB_DISABLE_XET=1 python - <<'PY'
	from huggingface_hub import snapshot_download
	snapshot_download("Systran/faster-whisper-medium.en", max_workers=1)
	PY
	```

	Then restart:

	```bash
	HF_HUB_DISABLE_XET=1 WHISPERMATH_WHISPER_MODEL=medium.en \
	python -m uvicorn app:app --host 127.0.0.1 --port 8766
	```

	## Security

	Never commit Hugging Face tokens or API keys into this folder.

	If a token is pasted into a chat or terminal history by mistake, revoke/rotate it from Hugging Face settings.