--- library_name: pytorch license: mit pipeline_tag: automatic-speech-recognition language: - vi - en tags: - automatic-speech-recognition - invoice-extraction - speech --- # ASR + Invoice Extraction Server Standalone packaging of `Server_conformer.py` to transcribe audio and extract invoice JSON from transcript text. This folder now includes a copy of the trained RNNT checkpoint for convenience. ## What’s inside - `Server_conformer.py`, `Speech2text.py`, `InformationExtractor.py` - `chunkformer/` code - `chunkformer-model/` - `requirements.txt` ## Prerequisites - Python 3.9+ and a CUDA GPU (required for Qwen invoice extraction; CPU will be extremely slow) - Hugging Face token with access to the models you use (`HF_TOKEN`) - Chunkformer RNNT checkpoint available at `chunkformer-model` (copied into this folder). Update `CHUNKFORMER_MODEL_PATH` if you place it elsewhere. ## Setup ```bash cd Speech2Invoice python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt ``` ## Configure environment Create a `.env` (or export env vars) with at least: ``` PORT=8000 USE_NGROK=false HF_TOKEN=your_hf_token_here CHUNKFORMER_MODEL_PATH=chunkformer-model LOG_LEVEL=DEBUG DEBUG=true # Optional ngrok NGROK_AUTHTOKEN= NGROK_REGION=ap # Optional invoice LLM overrides (defaults are fast) IE_LLM_MODEL_ID=Qwen/Qwen1.5-7B-Chat IE_MAX_NEW_TOKENS=256 IE_DO_SAMPLE=false IE_TEMPERATURE=0.0 IE_TOP_P=0.8 ``` If you move the model elsewhere, set `CHUNKFORMER_MODEL_PATH` to that directory. ## Run ```bash python3 Server_conformer.py ``` ## Endpoints - `POST /transcribe` — multipart/form-data with audio file (`wav`, `mp3`, `m4a`, `ogg`, `webm`). Returns JSON with `final_result` and `full_transcription`. - `POST /ticket` — JSON body `{"full_transcription": ""}`. Returns invoice JSON inferred by Qwen. ## Notes - The invoice extractor requires GPU and HF download on first run. Use smaller models via `IE_LLM_MODEL_ID` for speed. - Model weights for the RNNT checkpoint are included in `chunkformer-model/`. For large files, consider git-lfs if you plan to push to a remote. ## Contact For questions or controlled access requests to Speech2Invoice: * Duc Dat Pham * Email: [ducdatit2002@gmail.com](mailto:ducdatit2002@gmail.com)