---
library_name: pytorch
license: mit
pipeline_tag: automatic-speech-recognition
language:
  - vi
  - en
tags:
  - automatic-speech-recognition
  - invoice-extraction
  - speech
---

# ASR + Invoice Extraction Server

Standalone packaging of `Server_conformer.py` to transcribe audio and extract invoice JSON from transcript text. This folder now includes a copy of the trained RNNT checkpoint for convenience.

## What’s inside
- `Server_conformer.py`, `Speech2text.py`, `InformationExtractor.py`
- `chunkformer/` code
- `chunkformer-model/` 
- `requirements.txt`

## Prerequisites
- Python 3.9+ and a CUDA GPU (required for Qwen invoice extraction; CPU will be extremely slow)
- Hugging Face token with access to the models you use (`HF_TOKEN`)
- Chunkformer RNNT checkpoint available at `chunkformer-model` (copied into this folder). Update `CHUNKFORMER_MODEL_PATH` if you place it elsewhere.

## Setup
```bash
cd Speech2Invoice
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

## Configure environment
Create a `.env` (or export env vars) with at least:
```
PORT=8000
USE_NGROK=false
HF_TOKEN=your_hf_token_here
CHUNKFORMER_MODEL_PATH=chunkformer-model
LOG_LEVEL=DEBUG
DEBUG=true

# Optional ngrok
NGROK_AUTHTOKEN=
NGROK_REGION=ap

# Optional invoice LLM overrides (defaults are fast)
IE_LLM_MODEL_ID=Qwen/Qwen1.5-7B-Chat
IE_MAX_NEW_TOKENS=256
IE_DO_SAMPLE=false
IE_TEMPERATURE=0.0
IE_TOP_P=0.8
```

If you move the model elsewhere, set `CHUNKFORMER_MODEL_PATH` to that directory.

## Run
```bash
python3 Server_conformer.py
```

## Endpoints
- `POST /transcribe` — multipart/form-data with audio file (`wav`, `mp3`, `m4a`, `ogg`, `webm`). Returns JSON with `final_result` and `full_transcription`.
- `POST /ticket` — JSON body `{"full_transcription": "<text>"}`. Returns invoice JSON inferred by Qwen.

## Notes
- The invoice extractor requires GPU and HF download on first run. Use smaller models via `IE_LLM_MODEL_ID` for speed.
- Model weights for the RNNT checkpoint are included in `chunkformer-model/`. For large files, consider git-lfs if you plan to push to a remote.

## Contact

For questions or controlled access requests to Speech2Invoice:

* Duc Dat Pham
* Email: [ducdatit2002@gmail.com](mailto:ducdatit2002@gmail.com)