Instructions to use Zarinaaa/mt5-small-kyrgyz-normalization-ptft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Zarinaaa/mt5-small-kyrgyz-normalization-ptft with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Zarinaaa/mt5-small-kyrgyz-normalization-ptft")

# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Zarinaaa/mt5-small-kyrgyz-normalization-ptft")
model = AutoModelForSeq2SeqLM.from_pretrained("Zarinaaa/mt5-small-kyrgyz-normalization-ptft")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Zarinaaa/mt5-small-kyrgyz-normalization-ptft with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Zarinaaa/mt5-small-kyrgyz-normalization-ptft"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Zarinaaa/mt5-small-kyrgyz-normalization-ptft",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Zarinaaa/mt5-small-kyrgyz-normalization-ptft

SGLang

How to use Zarinaaa/mt5-small-kyrgyz-normalization-ptft with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Zarinaaa/mt5-small-kyrgyz-normalization-ptft" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Zarinaaa/mt5-small-kyrgyz-normalization-ptft",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Zarinaaa/mt5-small-kyrgyz-normalization-ptft" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Zarinaaa/mt5-small-kyrgyz-normalization-ptft",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Zarinaaa/mt5-small-kyrgyz-normalization-ptft with Docker Model Runner:
```
docker model run hf.co/Zarinaaa/mt5-small-kyrgyz-normalization-ptft
```

mt5-small-kyrgyz-normalization-ptft / README.md

Zarinaaa

Fix pipeline_tag in model card

fb418b5 verified 2 days ago

preview code

raw

history blame contribute delete

5.37 kB

	---
	language:
	- ky
	license: mit
	library_name: transformers
	pipeline_tag: text-generation
	base_model: google/mt5-small
	tags:
	- mt5
	- text-normalization
	- kyrgyz
	- low-resource
	- turkic
	- continual-pretraining
	datasets:
	- Zarinaaa/kyrgyz-text-normalization
	metrics:
	- cer
	- wer
	- exact_match
	---

	# mT5-small with continual pre-training + fine-tuning for Kyrgyz text normalization

	`google/mt5-small` continually pre-trained on a 538 MB Kyrgyz corpus (news portals + books) with T5-style span corruption, then fine-tuned on 1.67M noisy–clean text pairs for Kyrgyz text normalization.

	This is the continual pre-training + fine-tuning (PT+FT) variant from the camera-ready paper "Kyrgyz Text Normalization: A Comparative Study of Neural and Rule-Based Approaches" (MeLLM Workshop @ ACL 2026). For the fine-tuning-only variant see [Zarinaaa/mt5-small-kyrgyz-normalization](https://huggingface.co/Zarinaaa/mt5-small-kyrgyz-normalization).

	Note on choice between the two variants: in our experiments the additional continual pre-training step did not improve over direct fine-tuning (CER 0.0825 vs. 0.0796, p = 0.06). The main observable difference is a higher rate of hallucination (input repetition) in failure cases. For most users we recommend the fine-tune-only variant unless you specifically want the slightly better Digit–Word category performance (see Evaluation below).

	## Usage

	```python
	from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

	model_id = "Zarinaaa/mt5-small-kyrgyz-normalization-ptft"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

	noisy = "барды жакшы болсун коркунучту жерлерди тазалаш керек"
	inputs = tokenizer("correct: " + noisy, return_tensors="pt", truncation=True, max_length=256)
	out = model.generate(**inputs, max_new_tokens=256, num_beams=4)
	print(tokenizer.decode(out[0], skip_special_tokens=True))
	```

	The prefix `"correct: "` is required.

	## Training procedure

	### Stage 1 — Continual pre-training

	- Corpus: 538 MB clean Kyrgyz text from news portals and books
	- Objective: T5-style span corruption (mask rate 0.15, mean span length 3)
	- Epochs: 3
	- Train/validation split: 98 / 2, seed 42; best checkpoint by validation loss

	### Stage 2 — Fine-tuning

	Identical to the fine-tune-only variant:

	- Effective batch size: 64 (4 × 16 gradient accumulation)
	- Learning rate: 3e-4, cosine schedule, 500 warmup steps
	- Epochs: 5
	- Max sequence length: 256
	- Train/validation split: 95 / 5, seed 42; best checkpoint by validation loss
	- Hardware: 1× NVIDIA RTX 5080 (16 GB VRAM)

	## Evaluation

	Automatic metrics on the held-out 1,000-example test set:

	\| Metric \| Value \|
	\|---\|---\|
	\| CER \| 0.0825 ± 0.004 \|
	\| WER \| 0.2017 \|
	\| Exact Match \| 0.184 \|

	Vs. fine-tune-only (CER 0.0796): paired bootstrap two-sided p = 0.06. We treat this as insufficient evidence to reject the null of no difference, not as equivalence — n = 1,000 is underpowered for detecting small effects in either direction.

	Human evaluation (200 examples, 2 native annotators): 99.8% rated correct (Wilson 95% CI [0.986, 0.9996]); PABAK = 0.990, Gwet's AC1 = 0.995 — identical to the fine-tune-only variant at the ceiling.

	### Per-category CER

	\| Category \| N \| FT-only \| PT+FT \|
	\|---\|---\|---\|---\|
	\| Punctuation \| 849 \| 0.078 \| 0.081 \|
	\| Capitalization \| 62 \| 0.084 \| 0.085 \|
	\| All-caps \| 39 \| 0.084 \| 0.083 \|
	\| Digit–Word \| 41 \| 0.076 \| 0.067 \|

	PT+FT is numerically slightly better on Digit–Word compounds; with N = 41 we do not treat this as a robust advantage.

	### Failure analysis

	In 40 examples where FT outperforms PT+FT by more than 0.05 CER, hallucination (input repetition) is the dominant error mode (35/40 = 87.5%, 95% Wilson CI [74%, 95%]). Two non-exclusive hypotheses (see paper §6.1):

	1. Copy bias from span corruption — T5-style span corruption trains the decoder to reconstruct spans of the input verbatim, which may reinforce copying behavior harmful for normalization (where the target is usually not a superset of the input).
	2. Register mismatch — continual pre-training used clean, formal text (news/books), while fine-tuning targets normalize noisy informal social-media text. The register gap may push the model toward fluent formal continuations that read as hallucinations.

	## Limitations

	Same as the fine-tune-only variant, plus:

	- Higher hallucination rate in failure cases — if you need maximum robustness, use the FT-only variant.
	- No measurable benefit from the additional pre-training at this scale and corpus composition; results suggest a more targeted continual objective (in-domain noisy text, denoising closer to the normalization target) would be needed.

	## Citation

	```bibtex
	@inproceedings{uvalieva2026kyrgyz,
	title={Kyrgyz Text Normalization: A Comparative Study of Neural and Rule-Based Approaches},
	author={Uvalieva, Zarina and Kumarbai uulu, Bektemir and Metinov, Adilet and Tashbaltaev, Tynchtykbek and Alibekov, Nurtilek},
	booktitle={Proceedings of the MeLLM Workshop at ACL 2026},
	year={2026}
	}
	```

	## License

	MIT. Code: [github.com/Zarina33/Kyrgyz-Text-Normalization-Conference](https://github.com/Zarina33/Kyrgyz-Text-Normalization-Conference).