Tushar9802 commited on
Commit
d3595cb
·
1 Parent(s): 5575d97

feat(audio): demo clips + Translate + Whisper warm + CT2 mirror

Browse files

- demo_audio/ ships two curated ANC preeclampsia recordings (20s + 52s)
with a manifest. Voice tab gets a "Load voice example" dropdown that
fetches the file, populates the player, and runs extraction. Two
reviewer-facing wins: (a) text-only demo silently undersold voice;
(b) the audio path is the actual product moat.
- /api/audio-examples serves the manifest with synthesized URLs.
/audio/* static-mounts the bundle, ordered before the SPA catch-all.
.dockerignore allowlists demo_audio so the blanket *.ogg excludes
don't silently drop the curated clips at build time.
- /api/translate exposes Hindi/Hinglish -> English via the resident
Gemma. New Translate button below the transcript card, on-demand so
the main extraction path stays fast. ~3-5s on a hot T4.
- Whisper now eager-loads from the FastAPI startup hook. Space marks
READY only when both LLM and ASR are resident; first user audio
request no longer pays a ~60s cold load. WHISPER_MODEL env var
defaults to Tushar9802/whisper-large-v2-hindi-ct2 (CT2 mirror of
collabora/whisper-large-v2-hindi -- faster-whisper requires CT2
format, and the source repo is transformers). Local dev with a
models/whisper-hindi-ct2/ directory takes precedence over the env.
- /api/health returns the LLM and ASR model names; status line surfaces
both.
- Ollama keep_alive at both call sites now reads
OLLAMA_KEEP_ALIVE from env so the 24h container default actually
wins. Previously the hardcoded "10m" silently overrode the env.
- README touch-ups: ASR row notes the CT2 mirror, HF Space cache
section reflects eager-load + persistent caching.

.dockerignore CHANGED
@@ -72,6 +72,11 @@ app.log
72
  *.mpeg
73
  *.flac
74
  !tests/fixtures/*.wav
 
 
 
 
 
75
 
76
  # Secrets
77
  .env
 
72
  *.mpeg
73
  *.flac
74
  !tests/fixtures/*.wav
75
+ # Curated reviewer-facing demo clips must reach the image — these are tiny
76
+ # (≈100 KB each) and served by the FastAPI static mount at /audio/*.
77
+ !demo_audio/*.ogg
78
+ !demo_audio/*.wav
79
+ !demo_audio/*.mp3
80
 
81
  # Secrets
82
  .env
.gitattributes ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ *.ogg filter=lfs diff=lfs merge=lfs -text
2
+ *.wav filter=lfs diff=lfs merge=lfs -text
3
+ *.mp3 filter=lfs diff=lfs merge=lfs -text
Dockerfile CHANGED
@@ -63,6 +63,7 @@ COPY app.py api.py ./
63
  COPY src/ ./src/
64
  COPY configs/ ./configs/
65
  COPY scripts/ ./scripts/
 
66
  COPY FAILURES.md JUDGE_BRIEF.md README.md ./
67
  COPY entrypoint.sh ./
68
  RUN chmod +x entrypoint.sh
@@ -75,7 +76,8 @@ ENV PORT=7860 \
75
  OLLAMA_MODEL=gemma4:e4b-it-q4_K_M \
76
  OLLAMA_MODELS=/data/.ollama/models \
77
  HF_HOME=/data/.cache/huggingface \
78
- OLLAMA_KEEP_ALIVE=24h
 
79
 
80
  EXPOSE 7860
81
 
 
63
  COPY src/ ./src/
64
  COPY configs/ ./configs/
65
  COPY scripts/ ./scripts/
66
+ COPY demo_audio/ ./demo_audio/
67
  COPY FAILURES.md JUDGE_BRIEF.md README.md ./
68
  COPY entrypoint.sh ./
69
  RUN chmod +x entrypoint.sh
 
76
  OLLAMA_MODEL=gemma4:e4b-it-q4_K_M \
77
  OLLAMA_MODELS=/data/.ollama/models \
78
  HF_HOME=/data/.cache/huggingface \
79
+ OLLAMA_KEEP_ALIVE=24h \
80
+ WHISPER_MODEL=Tushar9802/whisper-large-v2-hindi-ct2
81
 
82
  EXPOSE 7860
83
 
README.md CHANGED
@@ -65,7 +65,7 @@ The pipeline uses a hybrid design: form extraction via `format="json"` (proven p
65
 
66
  | Component | Model | Size | Role | Deployment |
67
  |-----------|-------|------|------|------------|
68
- | ASR (workstation path only) | collabora/whisper-large-v2-hindi | ~1.5 GB | Hindi speech → text via faster-whisper/CTranslate2 | Workstation |
69
  | Normalization | src/hindi_normalize.py | — | Hindi number words → digits, medical term mapping | Shared (Python server-side; JS port for phone) |
70
  | Clinical Extraction (health-center mode, audio-in) | Gemma 4 E4B (Q4_K_M via Ollama) | ~5 GB | Function calling: form extraction + danger signs + referral | Workstation (GPU) |
71
  | Clinical Extraction (field mode, text-in) | Gemma 4 E2B (INT4 via Cactus SDK) | ~4.4 GB download / ~6.3 GB on-device extracted (multimodal package includes audio + vision encoders that the text-in path does not use) | Same extraction schema, plain-JSON mode (E2B INT4 does not reliably emit OpenAI-style `tool_calls`) | Android (ARM, Snapdragon 7+ Gen 1 or newer, 8 GB RAM, ~7 GB free storage for the one-time install) |
@@ -289,7 +289,7 @@ git push hf master
289
  # ~7 GB and the first request waits 3–5 min)
290
  ```
291
 
292
- On first boot the container pulls `gemma4:e4b-it-q4_K_M` into the persistent volume (~3 min). Subsequent restarts are instant. Whisper-Large CT2 downloads from HF Hub on the first audio request and stays cached under `$HF_HOME`.
293
 
294
  **Subsequent updates:** `git push hf master` after any code change; HF rebuilds and redeploys.
295
 
 
65
 
66
  | Component | Model | Size | Role | Deployment |
67
  |-----------|-------|------|------|------------|
68
+ | ASR (workstation path only) | collabora/whisper-large-v2-hindi (served as the CTranslate2 mirror Tushar9802/whisper-large-v2-hindi-ct2 — faster-whisper requires CT2 format) | ~1.5 GB | Hindi speech → text via faster-whisper/CTranslate2 | Workstation |
69
  | Normalization | src/hindi_normalize.py | — | Hindi number words → digits, medical term mapping | Shared (Python server-side; JS port for phone) |
70
  | Clinical Extraction (health-center mode, audio-in) | Gemma 4 E4B (Q4_K_M via Ollama) | ~5 GB | Function calling: form extraction + danger signs + referral | Workstation (GPU) |
71
  | Clinical Extraction (field mode, text-in) | Gemma 4 E2B (INT4 via Cactus SDK) | ~4.4 GB download / ~6.3 GB on-device extracted (multimodal package includes audio + vision encoders that the text-in path does not use) | Same extraction schema, plain-JSON mode (E2B INT4 does not reliably emit OpenAI-style `tool_calls`) | Android (ARM, Snapdragon 7+ Gen 1 or newer, 8 GB RAM, ~7 GB free storage for the one-time install) |
 
289
  # ~7 GB and the first request waits 3–5 min)
290
  ```
291
 
292
+ On first boot the container pulls `gemma4:e4b-it-q4_K_M` into the persistent volume (~3 min) and warms it with a one-token generate so the first user request lands hot. The FastAPI startup hook eagerly loads Whisper-Large CT2 from `Tushar9802/whisper-large-v2-hindi-ct2` (~3 GB, cached under `$HF_HOME` on the persistent volume after the first boot). The Space only reports ready when both models are resident. Subsequent restarts read everything from `/data` and are fast.
293
 
294
  **Subsequent updates:** `git push hf master` after any code change; HF rebuilds and redeploys.
295
 
api.py CHANGED
@@ -34,6 +34,9 @@ from app import (
34
  init_schemas,
35
  validate_form_output,
36
  postprocess_transcript,
 
 
 
37
  )
38
 
39
  app = FastAPI(title="Sakhi API", version="1.0.0")
@@ -46,10 +49,17 @@ app.add_middleware(
46
  allow_headers=["*"],
47
  )
48
 
49
- # Load schemas on startup models load lazily on first request (like Gradio)
 
 
 
50
  @app.on_event("startup")
51
  def startup():
52
  init_schemas()
 
 
 
 
53
 
54
 
55
  # ── Models ──
@@ -71,6 +81,10 @@ class TextRequest(BaseModel):
71
  metadata: Optional[PatientMetadata] = None
72
 
73
 
 
 
 
 
74
  class ExtractionResult(BaseModel):
75
  visit_type: str
76
  form: Optional[dict] = None
@@ -94,7 +108,11 @@ def _metadata_dict(meta):
94
  # ── Endpoints ──
95
  @app.get("/api/health")
96
  def health():
97
- return {"status": "ok", "model": os.environ.get("OLLAMA_MODEL", "gemma4:e4b-it-q4_K_M")}
 
 
 
 
98
 
99
 
100
  @app.get("/api/examples")
@@ -107,6 +125,34 @@ def examples():
107
  # index 1 = "ANC Visit — Preeclampsia (DANGER)" — best for demo (has danger signs)
108
 
109
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
  @app.post("/api/process-text", response_model=ExtractionResult)
111
  def process_text(req: TextRequest):
112
  t_total = time.time()
@@ -334,6 +380,13 @@ async def process_audio_stream(
334
  return StreamingResponse(generate(), media_type="text/event-stream")
335
 
336
 
 
 
 
 
 
 
 
337
  # Serve built React frontend at / when dist exists (unified desktop UI for health centers).
338
  # Must be mounted AFTER all /api/* routes so they take priority.
339
  _FRONTEND_DIST = os.path.join(os.path.dirname(os.path.abspath(__file__)), "frontend", "dist")
 
34
  init_schemas,
35
  validate_form_output,
36
  postprocess_transcript,
37
+ translate_to_english,
38
+ warm_whisper,
39
+ WHISPER_MODEL,
40
  )
41
 
42
  app = FastAPI(title="Sakhi API", version="1.0.0")
 
49
  allow_headers=["*"],
50
  )
51
 
52
+ # Startup: load schemas + pre-warm Whisper so the Space only reports ready
53
+ # when the audio path is hot. Whisper load is wrapped in try/except — if the
54
+ # eager load fails (no GPU, network blip), fall back to lazy loading on
55
+ # first audio request instead of blocking the whole boot.
56
  @app.on_event("startup")
57
  def startup():
58
  init_schemas()
59
+ try:
60
+ warm_whisper()
61
+ except Exception as e:
62
+ print(f"[startup] WARN: Whisper pre-warm failed ({e!r}); falling back to lazy load")
63
 
64
 
65
  # ── Models ──
 
81
  metadata: Optional[PatientMetadata] = None
82
 
83
 
84
+ class TranslateRequest(BaseModel):
85
+ text: str
86
+
87
+
88
  class ExtractionResult(BaseModel):
89
  visit_type: str
90
  form: Optional[dict] = None
 
108
  # ── Endpoints ──
109
  @app.get("/api/health")
110
  def health():
111
+ return {
112
+ "status": "ok",
113
+ "model": os.environ.get("OLLAMA_MODEL", "gemma4:e4b-it-q4_K_M"),
114
+ "whisper": WHISPER_MODEL,
115
+ }
116
 
117
 
118
  @app.get("/api/examples")
 
125
  # index 1 = "ANC Visit — Preeclampsia (DANGER)" — best for demo (has danger signs)
126
 
127
 
128
+ @app.post("/api/translate")
129
+ def translate(req: TranslateRequest):
130
+ """Hindi / Hinglish → English. Uses the same Gemma model already in VRAM,
131
+ so the cost is one extra ~3-5s LLM call. Reviewer-facing convenience;
132
+ never invoked from the main extraction path."""
133
+ t0 = time.time()
134
+ english = translate_to_english(req.text)
135
+ return {"english": english, "time_s": round(time.time() - t0, 2)}
136
+
137
+
138
+ _DEMO_AUDIO_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "demo_audio")
139
+
140
+
141
+ @app.get("/api/audio-examples")
142
+ def audio_examples():
143
+ """Curated voice clips bundled into the image. Returns playable URLs
144
+ relative to the Space origin, so the frontend can both <audio src=...>
145
+ them and re-POST them to /api/process-audio-stream."""
146
+ manifest_path = os.path.join(_DEMO_AUDIO_DIR, "manifest.json")
147
+ if not os.path.isfile(manifest_path):
148
+ return []
149
+ with open(manifest_path, "r", encoding="utf-8") as f:
150
+ entries = json.load(f)
151
+ for e in entries:
152
+ e["url"] = f"/audio/{e['file']}"
153
+ return entries
154
+
155
+
156
  @app.post("/api/process-text", response_model=ExtractionResult)
157
  def process_text(req: TextRequest):
158
  t_total = time.time()
 
380
  return StreamingResponse(generate(), media_type="text/event-stream")
381
 
382
 
383
+ # Serve curated demo audio under /audio/* so the frontend can <audio src=...>
384
+ # them. Must be mounted BEFORE the SPA catch-all below; otherwise the
385
+ # StaticFiles for `/` would swallow these paths.
386
+ if os.path.isdir(_DEMO_AUDIO_DIR):
387
+ app.mount("/audio", StaticFiles(directory=_DEMO_AUDIO_DIR), name="demo_audio")
388
+
389
+
390
  # Serve built React frontend at / when dist exists (unified desktop UI for health centers).
391
  # Must be mounted AFTER all /api/* routes so they take priority.
392
  _FRONTEND_DIST = os.path.join(os.path.dirname(os.path.abspath(__file__)), "frontend", "dist")
app.py CHANGED
@@ -27,6 +27,15 @@ OLLAMA_MODEL = os.environ.get("OLLAMA_MODEL", "gemma4:e4b-it-q4_K_M")
27
  USE_OLLAMA = os.environ.get("USE_OLLAMA", "1") == "1"
28
  USE_FUNCTION_CALLING = os.environ.get("USE_FUNCTION_CALLING", "1") == "1"
29
 
 
 
 
 
 
 
 
 
 
30
  # System prompts (same as training)
31
  FORM_SYSTEM_PROMPT = (
32
  "You are a clinical data extraction system for India's ASHA health worker program. "
@@ -189,20 +198,28 @@ from src.hindi_normalize import normalize_transcript as postprocess_transcript
189
 
190
  _whisper_model = None
191
 
192
- def transcribe_audio(audio_path):
193
- """Transcribe audio using collabora/whisper-large-v2-hindi via faster-whisper (CTranslate2)."""
 
 
194
  global _whisper_model
195
- if _whisper_model is None:
196
- from faster_whisper import WhisperModel
197
- import os
198
- ct2_path = os.path.join(os.path.dirname(__file__), "models", "whisper-hindi-ct2")
199
- if os.path.exists(ct2_path):
200
- print(f"[ASR] Loading CTranslate2 model from {ct2_path}...")
201
- _whisper_model = WhisperModel(ct2_path, device="cuda", compute_type="float16")
202
- else:
203
- print("[ASR] CT2 model not found, loading from HuggingFace (slower)...")
204
- _whisper_model = WhisperModel("collabora/whisper-large-v2-hindi", device="cuda", compute_type="float16")
205
- print("[ASR] Whisper loaded.")
 
 
 
 
 
 
206
 
207
  print("[ASR] Transcribing...")
208
  segments, info = _whisper_model.transcribe(
@@ -229,6 +246,38 @@ def run_inference(system_prompt, user_prompt):
229
  return _run_inference_unsloth(system_prompt, user_prompt)
230
 
231
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
232
  def _run_inference_ollama(system_prompt, user_prompt):
233
  """Run inference via Ollama API — fast GGUF on GPU with JSON mode."""
234
  import ollama
@@ -242,7 +291,7 @@ def _run_inference_ollama(system_prompt, user_prompt):
242
  ],
243
  format="json",
244
  options={"temperature": 0.1, "num_ctx": 4096, "num_gpu": 999},
245
- keep_alive="10m",
246
  )
247
  elapsed = time.time() - t0
248
 
@@ -378,7 +427,7 @@ def _run_danger_fc(transcript, visit_type):
378
  ],
379
  tools=tools,
380
  options={"temperature": 0.1, "num_ctx": 4096, "num_gpu": 999},
381
- keep_alive="10m",
382
  )
383
  elapsed = time.time() - t0
384
 
 
27
  USE_OLLAMA = os.environ.get("USE_OLLAMA", "1") == "1"
28
  USE_FUNCTION_CALLING = os.environ.get("USE_FUNCTION_CALLING", "1") == "1"
29
 
30
+ # Whisper config. Default = the CTranslate2-converted mirror of collabora's
31
+ # Hindi fine-tune of whisper-large-v2 (selected after session 19's real-voice
32
+ # validation pass). faster-whisper requires CT2 format; the original
33
+ # collabora/ repo is transformers format and won't load directly.
34
+ # Override with WHISPER_MODEL for evals against other variants. Local dev
35
+ # with a pre-converted CT2 directory at models/whisper-hindi-ct2/ takes
36
+ # precedence over this env var — see warm_whisper().
37
+ WHISPER_MODEL = os.environ.get("WHISPER_MODEL", "Tushar9802/whisper-large-v2-hindi-ct2")
38
+
39
  # System prompts (same as training)
40
  FORM_SYSTEM_PROMPT = (
41
  "You are a clinical data extraction system for India's ASHA health worker program. "
 
198
 
199
  _whisper_model = None
200
 
201
+ def warm_whisper():
202
+ """Eagerly load the Whisper model into VRAM. Idempotent — safe to call
203
+ multiple times; subsequent calls return the cached singleton. Called from
204
+ FastAPI's startup hook so the first user audio request lands hot."""
205
  global _whisper_model
206
+ if _whisper_model is not None:
207
+ return _whisper_model
208
+ from faster_whisper import WhisperModel
209
+ ct2_path = os.path.join(os.path.dirname(__file__), "models", "whisper-hindi-ct2")
210
+ if os.path.exists(ct2_path):
211
+ print(f"[ASR] Loading CTranslate2 model from {ct2_path}...")
212
+ _whisper_model = WhisperModel(ct2_path, device="cuda", compute_type="float16")
213
+ else:
214
+ print(f"[ASR] Loading {WHISPER_MODEL} from HuggingFace Hub...")
215
+ _whisper_model = WhisperModel(WHISPER_MODEL, device="cuda", compute_type="float16")
216
+ print("[ASR] Whisper loaded.")
217
+ return _whisper_model
218
+
219
+
220
+ def transcribe_audio(audio_path):
221
+ """Transcribe audio using the configured Whisper model via faster-whisper (CTranslate2)."""
222
+ warm_whisper()
223
 
224
  print("[ASR] Transcribing...")
225
  segments, info = _whisper_model.transcribe(
 
246
  return _run_inference_unsloth(system_prompt, user_prompt)
247
 
248
 
249
+ def translate_to_english(hindi_text):
250
+ """Translate Hindi / Hinglish home-visit text to English via the same
251
+ Gemma model already loaded in VRAM. On-demand only — never on the
252
+ main extraction path. Returns plain English text (not JSON)."""
253
+ import ollama
254
+
255
+ text = (hindi_text or "").strip()
256
+ if not text:
257
+ return ""
258
+
259
+ t0 = time.time()
260
+ resp = ollama.chat(
261
+ model=OLLAMA_MODEL,
262
+ messages=[
263
+ {"role": "system", "content": (
264
+ "Translate the following Hindi or Hinglish conversation into clear, natural English. "
265
+ "Preserve speaker labels (ASHA / Patient / Mother) and clinical numbers exactly. "
266
+ "Do not add commentary or explanations — output ONLY the translation."
267
+ )},
268
+ {"role": "user", "content": text},
269
+ ],
270
+ options={"temperature": 0.1, "num_ctx": 4096, "num_gpu": 999},
271
+ keep_alive=os.environ.get("OLLAMA_KEEP_ALIVE", "10m"),
272
+ )
273
+ elapsed = time.time() - t0
274
+
275
+ out = resp.message.content.strip()
276
+ tok_s = resp.eval_count / (resp.eval_duration / 1e9) if resp.eval_duration else 0
277
+ print(f"[LLM] Translate: {elapsed:.1f}s ({resp.eval_count} tok, {tok_s:.0f} tok/s)")
278
+ return out
279
+
280
+
281
  def _run_inference_ollama(system_prompt, user_prompt):
282
  """Run inference via Ollama API — fast GGUF on GPU with JSON mode."""
283
  import ollama
 
291
  ],
292
  format="json",
293
  options={"temperature": 0.1, "num_ctx": 4096, "num_gpu": 999},
294
+ keep_alive=os.environ.get("OLLAMA_KEEP_ALIVE", "10m"),
295
  )
296
  elapsed = time.time() - t0
297
 
 
427
  ],
428
  tools=tools,
429
  options={"temperature": 0.1, "num_ctx": 4096, "num_gpu": 999},
430
+ keep_alive=os.environ.get("OLLAMA_KEEP_ALIVE", "10m"),
431
  )
432
  elapsed = time.time() - t0
433
 
demo_audio/anc_preeclampsia_full.ogg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7572ab4c67cae5373451aae322882931e36fa9f643e33e2743200ce2b01d0e76
3
+ size 118010
demo_audio/anc_preeclampsia_short.ogg ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f36a1da62e4db68867036d72200fd5813087c6888a645a471e928144700e045
3
+ size 46456
demo_audio/manifest.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "id": "anc_preeclampsia_short",
4
+ "label": "ANC — Preeclampsia (short, 20s)",
5
+ "file": "anc_preeclampsia_short.ogg",
6
+ "duration_s": 20,
7
+ "visit_type_hint": "anc_visit",
8
+ "speaker": "male",
9
+ "description": "Short ANC clip. ASHA reads BP 155/100 — the danger-sign threshold trigger. Demonstrates the danger pipeline end-to-end on a clip a judge will actually sit through."
10
+ },
11
+ {
12
+ "id": "anc_preeclampsia_full",
13
+ "label": "ANC — Preeclampsia (full, 52s)",
14
+ "file": "anc_preeclampsia_full.ogg",
15
+ "duration_s": 52,
16
+ "visit_type_hint": "anc_visit",
17
+ "speaker": "male",
18
+ "description": "Full ANC home-visit role-play. Headache, blurred vision, facial swelling, BP 155/100, late-pregnancy context (≈8 months) — multiple preeclampsia danger signs plus a PHC-referral decision."
19
+ }
20
+ ]
frontend/src/App.css CHANGED
@@ -238,6 +238,18 @@ select {
238
  font-size: 13px;
239
  }
240
 
 
 
 
 
 
 
 
 
 
 
 
 
241
  .text-input {
242
  width: 100%;
243
  min-height: 220px;
 
238
  font-size: 13px;
239
  }
240
 
241
+ .translate-row {
242
+ display: flex;
243
+ align-items: center;
244
+ gap: 12px;
245
+ margin-top: 12px;
246
+ }
247
+
248
+ .error-text {
249
+ color: #991b1b;
250
+ font-size: 13px;
251
+ }
252
+
253
  .text-input {
254
  width: 100%;
255
  min-height: 220px;
frontend/src/App.jsx CHANGED
@@ -204,6 +204,16 @@ function App() {
204
  const [audioUrl, setAudioUrl] = useState('')
205
  const [isRecording, setIsRecording] = useState(false)
206
 
 
 
 
 
 
 
 
 
 
 
207
  const mediaRecorderRef = useRef(null)
208
  const streamRef = useRef(null)
209
  const chunksRef = useRef([])
@@ -261,7 +271,11 @@ function App() {
261
  fetch(`${API_BASE}/api/health`)
262
  .then((r) => r.json())
263
  .then((d) => {
264
- setHealth(`API: ${d.status} · Model: ${d.model}`)
 
 
 
 
265
  setApiReachable(true)
266
  })
267
  .catch(() => {
@@ -280,8 +294,52 @@ function App() {
280
  }
281
  })
282
  .catch(() => {})
 
 
 
 
 
283
  }, [])
284
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
285
  function saveServerUrl() {
286
  const cleaned = (serverUrlInput || '').trim().replace(/\/+$/, '')
287
  if (!cleaned) {
@@ -964,6 +1022,34 @@ function App() {
964
  visitType={recordingVisitType}
965
  setVisitType={setRecordingVisitType}
966
  />
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
967
  <div className="card">
968
  <div className="audio-tools audio-tools-3">
969
  <button className={`btn ${isRecording ? 'danger' : ''}`} onClick={isRecording ? stopRecording : startRecording}>
@@ -983,6 +1069,26 @@ function App() {
983
  <div className="card">
984
  <h3>Transcript</h3>
985
  <pre className="transcript">{voiceState.transcript || 'Transcript will appear here after processing audio.'}</pre>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
986
  </div>
987
  </section>
988
  )}
 
204
  const [audioUrl, setAudioUrl] = useState('')
205
  const [isRecording, setIsRecording] = useState(false)
206
 
207
+ // Curated audio examples served by the backend at /api/audio-examples.
208
+ // Lets a judge play a real role-play clip + run extraction without
209
+ // needing their own Hindi audio.
210
+ const [audioExamples, setAudioExamples] = useState([])
211
+ const [selectedAudioExample, setSelectedAudioExample] = useState('')
212
+
213
+ // On-demand English translation of the Hindi transcript. Not in the main
214
+ // extraction path — fires only when the reviewer clicks Translate.
215
+ const [translation, setTranslation] = useState({ loading: false, english: '', error: '' })
216
+
217
  const mediaRecorderRef = useRef(null)
218
  const streamRef = useRef(null)
219
  const chunksRef = useRef([])
 
271
  fetch(`${API_BASE}/api/health`)
272
  .then((r) => r.json())
273
  .then((d) => {
274
+ setHealth(
275
+ d.whisper
276
+ ? `API: ${d.status} · LLM: ${d.model} · ASR: ${d.whisper}`
277
+ : `API: ${d.status} · Model: ${d.model}`
278
+ )
279
  setApiReachable(true)
280
  })
281
  .catch(() => {
 
294
  }
295
  })
296
  .catch(() => {})
297
+
298
+ fetch(`${API_BASE}/api/audio-examples`)
299
+ .then((r) => r.json())
300
+ .then((data) => setAudioExamples(Array.isArray(data) ? data : []))
301
+ .catch(() => {})
302
  }, [])
303
 
304
+ async function loadAudioExample(id) {
305
+ setSelectedAudioExample(id)
306
+ if (!id) return
307
+ const ex = audioExamples.find((e) => e.id === id)
308
+ if (!ex) return
309
+ setVoiceState((s) => ({ ...s, error: '' }))
310
+ setTranslation({ loading: false, english: '', error: '' })
311
+ try {
312
+ const res = await fetch(`${API_BASE}${ex.url}`)
313
+ if (!res.ok) throw new Error(`fetch ${ex.url} → ${res.status}`)
314
+ const blob = await res.blob()
315
+ const file = new File([blob], ex.file, { type: blob.type || 'audio/ogg' })
316
+ if (audioUrl) URL.revokeObjectURL(audioUrl)
317
+ setAudioFile(file)
318
+ setAudioUrl(URL.createObjectURL(file))
319
+ if (ex.visit_type_hint) setRecordingVisitType(ex.visit_type_hint)
320
+ } catch (err) {
321
+ setVoiceState((s) => ({ ...s, error: `Could not load example: ${err.message}` }))
322
+ }
323
+ }
324
+
325
+ async function translateTranscript() {
326
+ const text = (voiceState.transcript || '').trim()
327
+ if (!text) return
328
+ setTranslation({ loading: true, english: '', error: '' })
329
+ try {
330
+ const res = await fetch(`${API_BASE}/api/translate`, {
331
+ method: 'POST',
332
+ headers: { 'Content-Type': 'application/json' },
333
+ body: JSON.stringify({ text }),
334
+ })
335
+ if (!res.ok) throw new Error(`HTTP ${res.status}`)
336
+ const data = await res.json()
337
+ setTranslation({ loading: false, english: data.english || '', error: '' })
338
+ } catch (err) {
339
+ setTranslation({ loading: false, english: '', error: err.message })
340
+ }
341
+ }
342
+
343
  function saveServerUrl() {
344
  const cleaned = (serverUrlInput || '').trim().replace(/\/+$/, '')
345
  if (!cleaned) {
 
1022
  visitType={recordingVisitType}
1023
  setVisitType={setRecordingVisitType}
1024
  />
1025
+ {audioExamples.length > 0 && (
1026
+ <div className="card">
1027
+ <div className="text-tools">
1028
+ <select
1029
+ value={selectedAudioExample}
1030
+ onChange={(e) => loadAudioExample(e.target.value)}
1031
+ >
1032
+ <option value="">Load voice example...</option>
1033
+ {audioExamples.map((ex) => (
1034
+ <option key={ex.id} value={ex.id}>
1035
+ {ex.label}
1036
+ </option>
1037
+ ))}
1038
+ </select>
1039
+ <button
1040
+ className="btn primary"
1041
+ onClick={processVoice}
1042
+ disabled={!audioFile || voiceState.loading}
1043
+ >
1044
+ {voiceState.loading ? 'Processing...' : 'Extract from example'}
1045
+ </button>
1046
+ </div>
1047
+ {selectedAudioExample && (() => {
1048
+ const ex = audioExamples.find((e) => e.id === selectedAudioExample)
1049
+ return ex ? <p className="field-desc">{ex.description}</p> : null
1050
+ })()}
1051
+ </div>
1052
+ )}
1053
  <div className="card">
1054
  <div className="audio-tools audio-tools-3">
1055
  <button className={`btn ${isRecording ? 'danger' : ''}`} onClick={isRecording ? stopRecording : startRecording}>
 
1069
  <div className="card">
1070
  <h3>Transcript</h3>
1071
  <pre className="transcript">{voiceState.transcript || 'Transcript will appear here after processing audio.'}</pre>
1072
+ {voiceState.transcript && (
1073
+ <div className="translate-row">
1074
+ <button
1075
+ className="btn secondary"
1076
+ onClick={translateTranscript}
1077
+ disabled={translation.loading}
1078
+ >
1079
+ {translation.loading ? 'Translating...' : 'Translate to English'}
1080
+ </button>
1081
+ {translation.error && (
1082
+ <span className="error-text">{translation.error}</span>
1083
+ )}
1084
+ </div>
1085
+ )}
1086
+ {translation.english && (
1087
+ <>
1088
+ <h3 style={{ marginTop: 16 }}>English translation</h3>
1089
+ <pre className="transcript">{translation.english}</pre>
1090
+ </>
1091
+ )}
1092
  </div>
1093
  </section>
1094
  )}