--- license: apache-2.0 tags: - automatic-speech-recognition - whisper - windyword - pashto - ps library_name: transformers pipeline_tag: automatic-speech-recognition language: - ps --- # WindyWord.ai STT — Pashto Lingua (GPU (safetensors)) **Transcribes Pashto speech (Indo-European > Indo-Iranian > Iranian).** > **Note:** **EXCELLENT tier when used correctly.** Derived from `ihanif/whisper-medium-pashto`. Verified at WER 5.3% / CER 3.2% / script-match 99.2% on 50-sample FLEURS ps_af *when inference uses* `forced_decoder_ids` (passed explicitly to `model.generate()` via `processor.get_decoder_prompt_ids(language='pashto', task='transcribe')`). With the convenience `language=` kwarg the model can silently drop the Pashto token and hallucinate English script on ~30% of samples (53.7% WER artifact). Always force the decoder prompt for Pashto inference. ## Quality - **WER:** unverified by WindyWord harness yet. Imported from upstream community fine-tune. ## About this variant This is the **safetensors** deployment format of our Pashto Lingua STT model. Load it via the `safetensors/` subfolder. Part of the [WindyWord.ai](https://windyword.ai) STT fleet — covering 35+ languages that commercial speech-to-text APIs underserve, with proper dialect / script disclosures where they matter. ## Usage ```python from transformers import WhisperForConditionalGeneration, WhisperProcessor processor = WhisperProcessor.from_pretrained("WindyWord/listen-windy-lingua-ps", subfolder="safetensors") model = WhisperForConditionalGeneration.from_pretrained("WindyWord/listen-windy-lingua-ps", subfolder="safetensors") ``` ## Commercial Use Visit [windyword.ai](https://windyword.ai) for apps and API access. --- ## Provenance & License Weights derived from upstream community Whisper fine-tunes (see individual model card for exact lineage). Redistributed under Apache-2.0 (inherited). *Certified by Opus 4.6 Opus-Claw (Dr. C) on Veron-1 (RTX 5090, Mt Pleasant SC).*