Spaces:
Sleeping
feat(audio): demo clips + Translate + Whisper warm + CT2 mirror
Browse files- demo_audio/ ships two curated ANC preeclampsia recordings (20s + 52s)
with a manifest. Voice tab gets a "Load voice example" dropdown that
fetches the file, populates the player, and runs extraction. Two
reviewer-facing wins: (a) text-only demo silently undersold voice;
(b) the audio path is the actual product moat.
- /api/audio-examples serves the manifest with synthesized URLs.
/audio/* static-mounts the bundle, ordered before the SPA catch-all.
.dockerignore allowlists demo_audio so the blanket *.ogg excludes
don't silently drop the curated clips at build time.
- /api/translate exposes Hindi/Hinglish -> English via the resident
Gemma. New Translate button below the transcript card, on-demand so
the main extraction path stays fast. ~3-5s on a hot T4.
- Whisper now eager-loads from the FastAPI startup hook. Space marks
READY only when both LLM and ASR are resident; first user audio
request no longer pays a ~60s cold load. WHISPER_MODEL env var
defaults to Tushar9802/whisper-large-v2-hindi-ct2 (CT2 mirror of
collabora/whisper-large-v2-hindi -- faster-whisper requires CT2
format, and the source repo is transformers). Local dev with a
models/whisper-hindi-ct2/ directory takes precedence over the env.
- /api/health returns the LLM and ASR model names; status line surfaces
both.
- Ollama keep_alive at both call sites now reads
OLLAMA_KEEP_ALIVE from env so the 24h container default actually
wins. Previously the hardcoded "10m" silently overrode the env.
- README touch-ups: ASR row notes the CT2 mirror, HF Space cache
section reflects eager-load + persistent caching.
- .dockerignore +5 -0
- .gitattributes +3 -0
- Dockerfile +3 -1
- README.md +2 -2
- api.py +55 -2
- app.py +64 -15
- demo_audio/anc_preeclampsia_full.ogg +3 -0
- demo_audio/anc_preeclampsia_short.ogg +3 -0
- demo_audio/manifest.json +20 -0
- frontend/src/App.css +12 -0
- frontend/src/App.jsx +107 -1
|
@@ -72,6 +72,11 @@ app.log
|
|
| 72 |
*.mpeg
|
| 73 |
*.flac
|
| 74 |
!tests/fixtures/*.wav
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
|
| 76 |
# Secrets
|
| 77 |
.env
|
|
|
|
| 72 |
*.mpeg
|
| 73 |
*.flac
|
| 74 |
!tests/fixtures/*.wav
|
| 75 |
+
# Curated reviewer-facing demo clips must reach the image — these are tiny
|
| 76 |
+
# (≈100 KB each) and served by the FastAPI static mount at /audio/*.
|
| 77 |
+
!demo_audio/*.ogg
|
| 78 |
+
!demo_audio/*.wav
|
| 79 |
+
!demo_audio/*.mp3
|
| 80 |
|
| 81 |
# Secrets
|
| 82 |
.env
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.ogg filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.wav filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.mp3 filter=lfs diff=lfs merge=lfs -text
|
|
@@ -63,6 +63,7 @@ COPY app.py api.py ./
|
|
| 63 |
COPY src/ ./src/
|
| 64 |
COPY configs/ ./configs/
|
| 65 |
COPY scripts/ ./scripts/
|
|
|
|
| 66 |
COPY FAILURES.md JUDGE_BRIEF.md README.md ./
|
| 67 |
COPY entrypoint.sh ./
|
| 68 |
RUN chmod +x entrypoint.sh
|
|
@@ -75,7 +76,8 @@ ENV PORT=7860 \
|
|
| 75 |
OLLAMA_MODEL=gemma4:e4b-it-q4_K_M \
|
| 76 |
OLLAMA_MODELS=/data/.ollama/models \
|
| 77 |
HF_HOME=/data/.cache/huggingface \
|
| 78 |
-
OLLAMA_KEEP_ALIVE=24h
|
|
|
|
| 79 |
|
| 80 |
EXPOSE 7860
|
| 81 |
|
|
|
|
| 63 |
COPY src/ ./src/
|
| 64 |
COPY configs/ ./configs/
|
| 65 |
COPY scripts/ ./scripts/
|
| 66 |
+
COPY demo_audio/ ./demo_audio/
|
| 67 |
COPY FAILURES.md JUDGE_BRIEF.md README.md ./
|
| 68 |
COPY entrypoint.sh ./
|
| 69 |
RUN chmod +x entrypoint.sh
|
|
|
|
| 76 |
OLLAMA_MODEL=gemma4:e4b-it-q4_K_M \
|
| 77 |
OLLAMA_MODELS=/data/.ollama/models \
|
| 78 |
HF_HOME=/data/.cache/huggingface \
|
| 79 |
+
OLLAMA_KEEP_ALIVE=24h \
|
| 80 |
+
WHISPER_MODEL=Tushar9802/whisper-large-v2-hindi-ct2
|
| 81 |
|
| 82 |
EXPOSE 7860
|
| 83 |
|
|
@@ -65,7 +65,7 @@ The pipeline uses a hybrid design: form extraction via `format="json"` (proven p
|
|
| 65 |
|
| 66 |
| Component | Model | Size | Role | Deployment |
|
| 67 |
|-----------|-------|------|------|------------|
|
| 68 |
-
| ASR (workstation path only) | collabora/whisper-large-v2-hindi | ~1.5 GB | Hindi speech → text via faster-whisper/CTranslate2 | Workstation |
|
| 69 |
| Normalization | src/hindi_normalize.py | — | Hindi number words → digits, medical term mapping | Shared (Python server-side; JS port for phone) |
|
| 70 |
| Clinical Extraction (health-center mode, audio-in) | Gemma 4 E4B (Q4_K_M via Ollama) | ~5 GB | Function calling: form extraction + danger signs + referral | Workstation (GPU) |
|
| 71 |
| Clinical Extraction (field mode, text-in) | Gemma 4 E2B (INT4 via Cactus SDK) | ~4.4 GB download / ~6.3 GB on-device extracted (multimodal package includes audio + vision encoders that the text-in path does not use) | Same extraction schema, plain-JSON mode (E2B INT4 does not reliably emit OpenAI-style `tool_calls`) | Android (ARM, Snapdragon 7+ Gen 1 or newer, 8 GB RAM, ~7 GB free storage for the one-time install) |
|
|
@@ -289,7 +289,7 @@ git push hf master
|
|
| 289 |
# ~7 GB and the first request waits 3–5 min)
|
| 290 |
```
|
| 291 |
|
| 292 |
-
On first boot the container pulls `gemma4:e4b-it-q4_K_M` into the persistent volume (~3 min)
|
| 293 |
|
| 294 |
**Subsequent updates:** `git push hf master` after any code change; HF rebuilds and redeploys.
|
| 295 |
|
|
|
|
| 65 |
|
| 66 |
| Component | Model | Size | Role | Deployment |
|
| 67 |
|-----------|-------|------|------|------------|
|
| 68 |
+
| ASR (workstation path only) | collabora/whisper-large-v2-hindi (served as the CTranslate2 mirror Tushar9802/whisper-large-v2-hindi-ct2 — faster-whisper requires CT2 format) | ~1.5 GB | Hindi speech → text via faster-whisper/CTranslate2 | Workstation |
|
| 69 |
| Normalization | src/hindi_normalize.py | — | Hindi number words → digits, medical term mapping | Shared (Python server-side; JS port for phone) |
|
| 70 |
| Clinical Extraction (health-center mode, audio-in) | Gemma 4 E4B (Q4_K_M via Ollama) | ~5 GB | Function calling: form extraction + danger signs + referral | Workstation (GPU) |
|
| 71 |
| Clinical Extraction (field mode, text-in) | Gemma 4 E2B (INT4 via Cactus SDK) | ~4.4 GB download / ~6.3 GB on-device extracted (multimodal package includes audio + vision encoders that the text-in path does not use) | Same extraction schema, plain-JSON mode (E2B INT4 does not reliably emit OpenAI-style `tool_calls`) | Android (ARM, Snapdragon 7+ Gen 1 or newer, 8 GB RAM, ~7 GB free storage for the one-time install) |
|
|
|
|
| 289 |
# ~7 GB and the first request waits 3–5 min)
|
| 290 |
```
|
| 291 |
|
| 292 |
+
On first boot the container pulls `gemma4:e4b-it-q4_K_M` into the persistent volume (~3 min) and warms it with a one-token generate so the first user request lands hot. The FastAPI startup hook eagerly loads Whisper-Large CT2 from `Tushar9802/whisper-large-v2-hindi-ct2` (~3 GB, cached under `$HF_HOME` on the persistent volume after the first boot). The Space only reports ready when both models are resident. Subsequent restarts read everything from `/data` and are fast.
|
| 293 |
|
| 294 |
**Subsequent updates:** `git push hf master` after any code change; HF rebuilds and redeploys.
|
| 295 |
|
|
@@ -34,6 +34,9 @@ from app import (
|
|
| 34 |
init_schemas,
|
| 35 |
validate_form_output,
|
| 36 |
postprocess_transcript,
|
|
|
|
|
|
|
|
|
|
| 37 |
)
|
| 38 |
|
| 39 |
app = FastAPI(title="Sakhi API", version="1.0.0")
|
|
@@ -46,10 +49,17 @@ app.add_middleware(
|
|
| 46 |
allow_headers=["*"],
|
| 47 |
)
|
| 48 |
|
| 49 |
-
#
|
|
|
|
|
|
|
|
|
|
| 50 |
@app.on_event("startup")
|
| 51 |
def startup():
|
| 52 |
init_schemas()
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
|
| 55 |
# ── Models ──
|
|
@@ -71,6 +81,10 @@ class TextRequest(BaseModel):
|
|
| 71 |
metadata: Optional[PatientMetadata] = None
|
| 72 |
|
| 73 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
class ExtractionResult(BaseModel):
|
| 75 |
visit_type: str
|
| 76 |
form: Optional[dict] = None
|
|
@@ -94,7 +108,11 @@ def _metadata_dict(meta):
|
|
| 94 |
# ── Endpoints ──
|
| 95 |
@app.get("/api/health")
|
| 96 |
def health():
|
| 97 |
-
return {
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
|
| 99 |
|
| 100 |
@app.get("/api/examples")
|
|
@@ -107,6 +125,34 @@ def examples():
|
|
| 107 |
# index 1 = "ANC Visit — Preeclampsia (DANGER)" — best for demo (has danger signs)
|
| 108 |
|
| 109 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 110 |
@app.post("/api/process-text", response_model=ExtractionResult)
|
| 111 |
def process_text(req: TextRequest):
|
| 112 |
t_total = time.time()
|
|
@@ -334,6 +380,13 @@ async def process_audio_stream(
|
|
| 334 |
return StreamingResponse(generate(), media_type="text/event-stream")
|
| 335 |
|
| 336 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 337 |
# Serve built React frontend at / when dist exists (unified desktop UI for health centers).
|
| 338 |
# Must be mounted AFTER all /api/* routes so they take priority.
|
| 339 |
_FRONTEND_DIST = os.path.join(os.path.dirname(os.path.abspath(__file__)), "frontend", "dist")
|
|
|
|
| 34 |
init_schemas,
|
| 35 |
validate_form_output,
|
| 36 |
postprocess_transcript,
|
| 37 |
+
translate_to_english,
|
| 38 |
+
warm_whisper,
|
| 39 |
+
WHISPER_MODEL,
|
| 40 |
)
|
| 41 |
|
| 42 |
app = FastAPI(title="Sakhi API", version="1.0.0")
|
|
|
|
| 49 |
allow_headers=["*"],
|
| 50 |
)
|
| 51 |
|
| 52 |
+
# Startup: load schemas + pre-warm Whisper so the Space only reports ready
|
| 53 |
+
# when the audio path is hot. Whisper load is wrapped in try/except — if the
|
| 54 |
+
# eager load fails (no GPU, network blip), fall back to lazy loading on
|
| 55 |
+
# first audio request instead of blocking the whole boot.
|
| 56 |
@app.on_event("startup")
|
| 57 |
def startup():
|
| 58 |
init_schemas()
|
| 59 |
+
try:
|
| 60 |
+
warm_whisper()
|
| 61 |
+
except Exception as e:
|
| 62 |
+
print(f"[startup] WARN: Whisper pre-warm failed ({e!r}); falling back to lazy load")
|
| 63 |
|
| 64 |
|
| 65 |
# ── Models ──
|
|
|
|
| 81 |
metadata: Optional[PatientMetadata] = None
|
| 82 |
|
| 83 |
|
| 84 |
+
class TranslateRequest(BaseModel):
|
| 85 |
+
text: str
|
| 86 |
+
|
| 87 |
+
|
| 88 |
class ExtractionResult(BaseModel):
|
| 89 |
visit_type: str
|
| 90 |
form: Optional[dict] = None
|
|
|
|
| 108 |
# ── Endpoints ──
|
| 109 |
@app.get("/api/health")
|
| 110 |
def health():
|
| 111 |
+
return {
|
| 112 |
+
"status": "ok",
|
| 113 |
+
"model": os.environ.get("OLLAMA_MODEL", "gemma4:e4b-it-q4_K_M"),
|
| 114 |
+
"whisper": WHISPER_MODEL,
|
| 115 |
+
}
|
| 116 |
|
| 117 |
|
| 118 |
@app.get("/api/examples")
|
|
|
|
| 125 |
# index 1 = "ANC Visit — Preeclampsia (DANGER)" — best for demo (has danger signs)
|
| 126 |
|
| 127 |
|
| 128 |
+
@app.post("/api/translate")
|
| 129 |
+
def translate(req: TranslateRequest):
|
| 130 |
+
"""Hindi / Hinglish → English. Uses the same Gemma model already in VRAM,
|
| 131 |
+
so the cost is one extra ~3-5s LLM call. Reviewer-facing convenience;
|
| 132 |
+
never invoked from the main extraction path."""
|
| 133 |
+
t0 = time.time()
|
| 134 |
+
english = translate_to_english(req.text)
|
| 135 |
+
return {"english": english, "time_s": round(time.time() - t0, 2)}
|
| 136 |
+
|
| 137 |
+
|
| 138 |
+
_DEMO_AUDIO_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "demo_audio")
|
| 139 |
+
|
| 140 |
+
|
| 141 |
+
@app.get("/api/audio-examples")
|
| 142 |
+
def audio_examples():
|
| 143 |
+
"""Curated voice clips bundled into the image. Returns playable URLs
|
| 144 |
+
relative to the Space origin, so the frontend can both <audio src=...>
|
| 145 |
+
them and re-POST them to /api/process-audio-stream."""
|
| 146 |
+
manifest_path = os.path.join(_DEMO_AUDIO_DIR, "manifest.json")
|
| 147 |
+
if not os.path.isfile(manifest_path):
|
| 148 |
+
return []
|
| 149 |
+
with open(manifest_path, "r", encoding="utf-8") as f:
|
| 150 |
+
entries = json.load(f)
|
| 151 |
+
for e in entries:
|
| 152 |
+
e["url"] = f"/audio/{e['file']}"
|
| 153 |
+
return entries
|
| 154 |
+
|
| 155 |
+
|
| 156 |
@app.post("/api/process-text", response_model=ExtractionResult)
|
| 157 |
def process_text(req: TextRequest):
|
| 158 |
t_total = time.time()
|
|
|
|
| 380 |
return StreamingResponse(generate(), media_type="text/event-stream")
|
| 381 |
|
| 382 |
|
| 383 |
+
# Serve curated demo audio under /audio/* so the frontend can <audio src=...>
|
| 384 |
+
# them. Must be mounted BEFORE the SPA catch-all below; otherwise the
|
| 385 |
+
# StaticFiles for `/` would swallow these paths.
|
| 386 |
+
if os.path.isdir(_DEMO_AUDIO_DIR):
|
| 387 |
+
app.mount("/audio", StaticFiles(directory=_DEMO_AUDIO_DIR), name="demo_audio")
|
| 388 |
+
|
| 389 |
+
|
| 390 |
# Serve built React frontend at / when dist exists (unified desktop UI for health centers).
|
| 391 |
# Must be mounted AFTER all /api/* routes so they take priority.
|
| 392 |
_FRONTEND_DIST = os.path.join(os.path.dirname(os.path.abspath(__file__)), "frontend", "dist")
|
|
@@ -27,6 +27,15 @@ OLLAMA_MODEL = os.environ.get("OLLAMA_MODEL", "gemma4:e4b-it-q4_K_M")
|
|
| 27 |
USE_OLLAMA = os.environ.get("USE_OLLAMA", "1") == "1"
|
| 28 |
USE_FUNCTION_CALLING = os.environ.get("USE_FUNCTION_CALLING", "1") == "1"
|
| 29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
# System prompts (same as training)
|
| 31 |
FORM_SYSTEM_PROMPT = (
|
| 32 |
"You are a clinical data extraction system for India's ASHA health worker program. "
|
|
@@ -189,20 +198,28 @@ from src.hindi_normalize import normalize_transcript as postprocess_transcript
|
|
| 189 |
|
| 190 |
_whisper_model = None
|
| 191 |
|
| 192 |
-
def
|
| 193 |
-
"""
|
|
|
|
|
|
|
| 194 |
global _whisper_model
|
| 195 |
-
if _whisper_model is None:
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 206 |
|
| 207 |
print("[ASR] Transcribing...")
|
| 208 |
segments, info = _whisper_model.transcribe(
|
|
@@ -229,6 +246,38 @@ def run_inference(system_prompt, user_prompt):
|
|
| 229 |
return _run_inference_unsloth(system_prompt, user_prompt)
|
| 230 |
|
| 231 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 232 |
def _run_inference_ollama(system_prompt, user_prompt):
|
| 233 |
"""Run inference via Ollama API — fast GGUF on GPU with JSON mode."""
|
| 234 |
import ollama
|
|
@@ -242,7 +291,7 @@ def _run_inference_ollama(system_prompt, user_prompt):
|
|
| 242 |
],
|
| 243 |
format="json",
|
| 244 |
options={"temperature": 0.1, "num_ctx": 4096, "num_gpu": 999},
|
| 245 |
-
keep_alive="10m",
|
| 246 |
)
|
| 247 |
elapsed = time.time() - t0
|
| 248 |
|
|
@@ -378,7 +427,7 @@ def _run_danger_fc(transcript, visit_type):
|
|
| 378 |
],
|
| 379 |
tools=tools,
|
| 380 |
options={"temperature": 0.1, "num_ctx": 4096, "num_gpu": 999},
|
| 381 |
-
keep_alive="10m",
|
| 382 |
)
|
| 383 |
elapsed = time.time() - t0
|
| 384 |
|
|
|
|
| 27 |
USE_OLLAMA = os.environ.get("USE_OLLAMA", "1") == "1"
|
| 28 |
USE_FUNCTION_CALLING = os.environ.get("USE_FUNCTION_CALLING", "1") == "1"
|
| 29 |
|
| 30 |
+
# Whisper config. Default = the CTranslate2-converted mirror of collabora's
|
| 31 |
+
# Hindi fine-tune of whisper-large-v2 (selected after session 19's real-voice
|
| 32 |
+
# validation pass). faster-whisper requires CT2 format; the original
|
| 33 |
+
# collabora/ repo is transformers format and won't load directly.
|
| 34 |
+
# Override with WHISPER_MODEL for evals against other variants. Local dev
|
| 35 |
+
# with a pre-converted CT2 directory at models/whisper-hindi-ct2/ takes
|
| 36 |
+
# precedence over this env var — see warm_whisper().
|
| 37 |
+
WHISPER_MODEL = os.environ.get("WHISPER_MODEL", "Tushar9802/whisper-large-v2-hindi-ct2")
|
| 38 |
+
|
| 39 |
# System prompts (same as training)
|
| 40 |
FORM_SYSTEM_PROMPT = (
|
| 41 |
"You are a clinical data extraction system for India's ASHA health worker program. "
|
|
|
|
| 198 |
|
| 199 |
_whisper_model = None
|
| 200 |
|
| 201 |
+
def warm_whisper():
|
| 202 |
+
"""Eagerly load the Whisper model into VRAM. Idempotent — safe to call
|
| 203 |
+
multiple times; subsequent calls return the cached singleton. Called from
|
| 204 |
+
FastAPI's startup hook so the first user audio request lands hot."""
|
| 205 |
global _whisper_model
|
| 206 |
+
if _whisper_model is not None:
|
| 207 |
+
return _whisper_model
|
| 208 |
+
from faster_whisper import WhisperModel
|
| 209 |
+
ct2_path = os.path.join(os.path.dirname(__file__), "models", "whisper-hindi-ct2")
|
| 210 |
+
if os.path.exists(ct2_path):
|
| 211 |
+
print(f"[ASR] Loading CTranslate2 model from {ct2_path}...")
|
| 212 |
+
_whisper_model = WhisperModel(ct2_path, device="cuda", compute_type="float16")
|
| 213 |
+
else:
|
| 214 |
+
print(f"[ASR] Loading {WHISPER_MODEL} from HuggingFace Hub...")
|
| 215 |
+
_whisper_model = WhisperModel(WHISPER_MODEL, device="cuda", compute_type="float16")
|
| 216 |
+
print("[ASR] Whisper loaded.")
|
| 217 |
+
return _whisper_model
|
| 218 |
+
|
| 219 |
+
|
| 220 |
+
def transcribe_audio(audio_path):
|
| 221 |
+
"""Transcribe audio using the configured Whisper model via faster-whisper (CTranslate2)."""
|
| 222 |
+
warm_whisper()
|
| 223 |
|
| 224 |
print("[ASR] Transcribing...")
|
| 225 |
segments, info = _whisper_model.transcribe(
|
|
|
|
| 246 |
return _run_inference_unsloth(system_prompt, user_prompt)
|
| 247 |
|
| 248 |
|
| 249 |
+
def translate_to_english(hindi_text):
|
| 250 |
+
"""Translate Hindi / Hinglish home-visit text to English via the same
|
| 251 |
+
Gemma model already loaded in VRAM. On-demand only — never on the
|
| 252 |
+
main extraction path. Returns plain English text (not JSON)."""
|
| 253 |
+
import ollama
|
| 254 |
+
|
| 255 |
+
text = (hindi_text or "").strip()
|
| 256 |
+
if not text:
|
| 257 |
+
return ""
|
| 258 |
+
|
| 259 |
+
t0 = time.time()
|
| 260 |
+
resp = ollama.chat(
|
| 261 |
+
model=OLLAMA_MODEL,
|
| 262 |
+
messages=[
|
| 263 |
+
{"role": "system", "content": (
|
| 264 |
+
"Translate the following Hindi or Hinglish conversation into clear, natural English. "
|
| 265 |
+
"Preserve speaker labels (ASHA / Patient / Mother) and clinical numbers exactly. "
|
| 266 |
+
"Do not add commentary or explanations — output ONLY the translation."
|
| 267 |
+
)},
|
| 268 |
+
{"role": "user", "content": text},
|
| 269 |
+
],
|
| 270 |
+
options={"temperature": 0.1, "num_ctx": 4096, "num_gpu": 999},
|
| 271 |
+
keep_alive=os.environ.get("OLLAMA_KEEP_ALIVE", "10m"),
|
| 272 |
+
)
|
| 273 |
+
elapsed = time.time() - t0
|
| 274 |
+
|
| 275 |
+
out = resp.message.content.strip()
|
| 276 |
+
tok_s = resp.eval_count / (resp.eval_duration / 1e9) if resp.eval_duration else 0
|
| 277 |
+
print(f"[LLM] Translate: {elapsed:.1f}s ({resp.eval_count} tok, {tok_s:.0f} tok/s)")
|
| 278 |
+
return out
|
| 279 |
+
|
| 280 |
+
|
| 281 |
def _run_inference_ollama(system_prompt, user_prompt):
|
| 282 |
"""Run inference via Ollama API — fast GGUF on GPU with JSON mode."""
|
| 283 |
import ollama
|
|
|
|
| 291 |
],
|
| 292 |
format="json",
|
| 293 |
options={"temperature": 0.1, "num_ctx": 4096, "num_gpu": 999},
|
| 294 |
+
keep_alive=os.environ.get("OLLAMA_KEEP_ALIVE", "10m"),
|
| 295 |
)
|
| 296 |
elapsed = time.time() - t0
|
| 297 |
|
|
|
|
| 427 |
],
|
| 428 |
tools=tools,
|
| 429 |
options={"temperature": 0.1, "num_ctx": 4096, "num_gpu": 999},
|
| 430 |
+
keep_alive=os.environ.get("OLLAMA_KEEP_ALIVE", "10m"),
|
| 431 |
)
|
| 432 |
elapsed = time.time() - t0
|
| 433 |
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7572ab4c67cae5373451aae322882931e36fa9f643e33e2743200ce2b01d0e76
|
| 3 |
+
size 118010
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2f36a1da62e4db68867036d72200fd5813087c6888a645a471e928144700e045
|
| 3 |
+
size 46456
|
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[
|
| 2 |
+
{
|
| 3 |
+
"id": "anc_preeclampsia_short",
|
| 4 |
+
"label": "ANC — Preeclampsia (short, 20s)",
|
| 5 |
+
"file": "anc_preeclampsia_short.ogg",
|
| 6 |
+
"duration_s": 20,
|
| 7 |
+
"visit_type_hint": "anc_visit",
|
| 8 |
+
"speaker": "male",
|
| 9 |
+
"description": "Short ANC clip. ASHA reads BP 155/100 — the danger-sign threshold trigger. Demonstrates the danger pipeline end-to-end on a clip a judge will actually sit through."
|
| 10 |
+
},
|
| 11 |
+
{
|
| 12 |
+
"id": "anc_preeclampsia_full",
|
| 13 |
+
"label": "ANC — Preeclampsia (full, 52s)",
|
| 14 |
+
"file": "anc_preeclampsia_full.ogg",
|
| 15 |
+
"duration_s": 52,
|
| 16 |
+
"visit_type_hint": "anc_visit",
|
| 17 |
+
"speaker": "male",
|
| 18 |
+
"description": "Full ANC home-visit role-play. Headache, blurred vision, facial swelling, BP 155/100, late-pregnancy context (≈8 months) — multiple preeclampsia danger signs plus a PHC-referral decision."
|
| 19 |
+
}
|
| 20 |
+
]
|
|
@@ -238,6 +238,18 @@ select {
|
|
| 238 |
font-size: 13px;
|
| 239 |
}
|
| 240 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 241 |
.text-input {
|
| 242 |
width: 100%;
|
| 243 |
min-height: 220px;
|
|
|
|
| 238 |
font-size: 13px;
|
| 239 |
}
|
| 240 |
|
| 241 |
+
.translate-row {
|
| 242 |
+
display: flex;
|
| 243 |
+
align-items: center;
|
| 244 |
+
gap: 12px;
|
| 245 |
+
margin-top: 12px;
|
| 246 |
+
}
|
| 247 |
+
|
| 248 |
+
.error-text {
|
| 249 |
+
color: #991b1b;
|
| 250 |
+
font-size: 13px;
|
| 251 |
+
}
|
| 252 |
+
|
| 253 |
.text-input {
|
| 254 |
width: 100%;
|
| 255 |
min-height: 220px;
|
|
@@ -204,6 +204,16 @@ function App() {
|
|
| 204 |
const [audioUrl, setAudioUrl] = useState('')
|
| 205 |
const [isRecording, setIsRecording] = useState(false)
|
| 206 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 207 |
const mediaRecorderRef = useRef(null)
|
| 208 |
const streamRef = useRef(null)
|
| 209 |
const chunksRef = useRef([])
|
|
@@ -261,7 +271,11 @@ function App() {
|
|
| 261 |
fetch(`${API_BASE}/api/health`)
|
| 262 |
.then((r) => r.json())
|
| 263 |
.then((d) => {
|
| 264 |
-
setHealth(
|
|
|
|
|
|
|
|
|
|
|
|
|
| 265 |
setApiReachable(true)
|
| 266 |
})
|
| 267 |
.catch(() => {
|
|
@@ -280,8 +294,52 @@ function App() {
|
|
| 280 |
}
|
| 281 |
})
|
| 282 |
.catch(() => {})
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 283 |
}, [])
|
| 284 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 285 |
function saveServerUrl() {
|
| 286 |
const cleaned = (serverUrlInput || '').trim().replace(/\/+$/, '')
|
| 287 |
if (!cleaned) {
|
|
@@ -964,6 +1022,34 @@ function App() {
|
|
| 964 |
visitType={recordingVisitType}
|
| 965 |
setVisitType={setRecordingVisitType}
|
| 966 |
/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 967 |
<div className="card">
|
| 968 |
<div className="audio-tools audio-tools-3">
|
| 969 |
<button className={`btn ${isRecording ? 'danger' : ''}`} onClick={isRecording ? stopRecording : startRecording}>
|
|
@@ -983,6 +1069,26 @@ function App() {
|
|
| 983 |
<div className="card">
|
| 984 |
<h3>Transcript</h3>
|
| 985 |
<pre className="transcript">{voiceState.transcript || 'Transcript will appear here after processing audio.'}</pre>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 986 |
</div>
|
| 987 |
</section>
|
| 988 |
)}
|
|
|
|
| 204 |
const [audioUrl, setAudioUrl] = useState('')
|
| 205 |
const [isRecording, setIsRecording] = useState(false)
|
| 206 |
|
| 207 |
+
// Curated audio examples served by the backend at /api/audio-examples.
|
| 208 |
+
// Lets a judge play a real role-play clip + run extraction without
|
| 209 |
+
// needing their own Hindi audio.
|
| 210 |
+
const [audioExamples, setAudioExamples] = useState([])
|
| 211 |
+
const [selectedAudioExample, setSelectedAudioExample] = useState('')
|
| 212 |
+
|
| 213 |
+
// On-demand English translation of the Hindi transcript. Not in the main
|
| 214 |
+
// extraction path — fires only when the reviewer clicks Translate.
|
| 215 |
+
const [translation, setTranslation] = useState({ loading: false, english: '', error: '' })
|
| 216 |
+
|
| 217 |
const mediaRecorderRef = useRef(null)
|
| 218 |
const streamRef = useRef(null)
|
| 219 |
const chunksRef = useRef([])
|
|
|
|
| 271 |
fetch(`${API_BASE}/api/health`)
|
| 272 |
.then((r) => r.json())
|
| 273 |
.then((d) => {
|
| 274 |
+
setHealth(
|
| 275 |
+
d.whisper
|
| 276 |
+
? `API: ${d.status} · LLM: ${d.model} · ASR: ${d.whisper}`
|
| 277 |
+
: `API: ${d.status} · Model: ${d.model}`
|
| 278 |
+
)
|
| 279 |
setApiReachable(true)
|
| 280 |
})
|
| 281 |
.catch(() => {
|
|
|
|
| 294 |
}
|
| 295 |
})
|
| 296 |
.catch(() => {})
|
| 297 |
+
|
| 298 |
+
fetch(`${API_BASE}/api/audio-examples`)
|
| 299 |
+
.then((r) => r.json())
|
| 300 |
+
.then((data) => setAudioExamples(Array.isArray(data) ? data : []))
|
| 301 |
+
.catch(() => {})
|
| 302 |
}, [])
|
| 303 |
|
| 304 |
+
async function loadAudioExample(id) {
|
| 305 |
+
setSelectedAudioExample(id)
|
| 306 |
+
if (!id) return
|
| 307 |
+
const ex = audioExamples.find((e) => e.id === id)
|
| 308 |
+
if (!ex) return
|
| 309 |
+
setVoiceState((s) => ({ ...s, error: '' }))
|
| 310 |
+
setTranslation({ loading: false, english: '', error: '' })
|
| 311 |
+
try {
|
| 312 |
+
const res = await fetch(`${API_BASE}${ex.url}`)
|
| 313 |
+
if (!res.ok) throw new Error(`fetch ${ex.url} → ${res.status}`)
|
| 314 |
+
const blob = await res.blob()
|
| 315 |
+
const file = new File([blob], ex.file, { type: blob.type || 'audio/ogg' })
|
| 316 |
+
if (audioUrl) URL.revokeObjectURL(audioUrl)
|
| 317 |
+
setAudioFile(file)
|
| 318 |
+
setAudioUrl(URL.createObjectURL(file))
|
| 319 |
+
if (ex.visit_type_hint) setRecordingVisitType(ex.visit_type_hint)
|
| 320 |
+
} catch (err) {
|
| 321 |
+
setVoiceState((s) => ({ ...s, error: `Could not load example: ${err.message}` }))
|
| 322 |
+
}
|
| 323 |
+
}
|
| 324 |
+
|
| 325 |
+
async function translateTranscript() {
|
| 326 |
+
const text = (voiceState.transcript || '').trim()
|
| 327 |
+
if (!text) return
|
| 328 |
+
setTranslation({ loading: true, english: '', error: '' })
|
| 329 |
+
try {
|
| 330 |
+
const res = await fetch(`${API_BASE}/api/translate`, {
|
| 331 |
+
method: 'POST',
|
| 332 |
+
headers: { 'Content-Type': 'application/json' },
|
| 333 |
+
body: JSON.stringify({ text }),
|
| 334 |
+
})
|
| 335 |
+
if (!res.ok) throw new Error(`HTTP ${res.status}`)
|
| 336 |
+
const data = await res.json()
|
| 337 |
+
setTranslation({ loading: false, english: data.english || '', error: '' })
|
| 338 |
+
} catch (err) {
|
| 339 |
+
setTranslation({ loading: false, english: '', error: err.message })
|
| 340 |
+
}
|
| 341 |
+
}
|
| 342 |
+
|
| 343 |
function saveServerUrl() {
|
| 344 |
const cleaned = (serverUrlInput || '').trim().replace(/\/+$/, '')
|
| 345 |
if (!cleaned) {
|
|
|
|
| 1022 |
visitType={recordingVisitType}
|
| 1023 |
setVisitType={setRecordingVisitType}
|
| 1024 |
/>
|
| 1025 |
+
{audioExamples.length > 0 && (
|
| 1026 |
+
<div className="card">
|
| 1027 |
+
<div className="text-tools">
|
| 1028 |
+
<select
|
| 1029 |
+
value={selectedAudioExample}
|
| 1030 |
+
onChange={(e) => loadAudioExample(e.target.value)}
|
| 1031 |
+
>
|
| 1032 |
+
<option value="">Load voice example...</option>
|
| 1033 |
+
{audioExamples.map((ex) => (
|
| 1034 |
+
<option key={ex.id} value={ex.id}>
|
| 1035 |
+
{ex.label}
|
| 1036 |
+
</option>
|
| 1037 |
+
))}
|
| 1038 |
+
</select>
|
| 1039 |
+
<button
|
| 1040 |
+
className="btn primary"
|
| 1041 |
+
onClick={processVoice}
|
| 1042 |
+
disabled={!audioFile || voiceState.loading}
|
| 1043 |
+
>
|
| 1044 |
+
{voiceState.loading ? 'Processing...' : 'Extract from example'}
|
| 1045 |
+
</button>
|
| 1046 |
+
</div>
|
| 1047 |
+
{selectedAudioExample && (() => {
|
| 1048 |
+
const ex = audioExamples.find((e) => e.id === selectedAudioExample)
|
| 1049 |
+
return ex ? <p className="field-desc">{ex.description}</p> : null
|
| 1050 |
+
})()}
|
| 1051 |
+
</div>
|
| 1052 |
+
)}
|
| 1053 |
<div className="card">
|
| 1054 |
<div className="audio-tools audio-tools-3">
|
| 1055 |
<button className={`btn ${isRecording ? 'danger' : ''}`} onClick={isRecording ? stopRecording : startRecording}>
|
|
|
|
| 1069 |
<div className="card">
|
| 1070 |
<h3>Transcript</h3>
|
| 1071 |
<pre className="transcript">{voiceState.transcript || 'Transcript will appear here after processing audio.'}</pre>
|
| 1072 |
+
{voiceState.transcript && (
|
| 1073 |
+
<div className="translate-row">
|
| 1074 |
+
<button
|
| 1075 |
+
className="btn secondary"
|
| 1076 |
+
onClick={translateTranscript}
|
| 1077 |
+
disabled={translation.loading}
|
| 1078 |
+
>
|
| 1079 |
+
{translation.loading ? 'Translating...' : 'Translate to English'}
|
| 1080 |
+
</button>
|
| 1081 |
+
{translation.error && (
|
| 1082 |
+
<span className="error-text">{translation.error}</span>
|
| 1083 |
+
)}
|
| 1084 |
+
</div>
|
| 1085 |
+
)}
|
| 1086 |
+
{translation.english && (
|
| 1087 |
+
<>
|
| 1088 |
+
<h3 style={{ marginTop: 16 }}>English translation</h3>
|
| 1089 |
+
<pre className="transcript">{translation.english}</pre>
|
| 1090 |
+
</>
|
| 1091 |
+
)}
|
| 1092 |
</div>
|
| 1093 |
</section>
|
| 1094 |
)}
|