VibeVoice Egyptian Arabic — cfg_scale=3.5

Fine-tuned VibeVoice on Egyptian Arabic dialect (checkpoint-9160).

cfg_scale=3.5

Natural prosody, moderate voice cloning. Good for conversational TTS.

Lowest safe value — values below 3.0 cause over-generation past script end.

Repo contents

File	Description
`model.safetensors`	Merged model weights (5.1 GB, single shard)
`config.json`	Model architecture config
`tokenizer.json` + friends	Qwen2.5 tokenizer files
`preprocessor_config.json`	Audio processor settings
`voices/egyptian_male.wav`	Reference voice for male speaker
`voices/egyptian_female.wav`	Reference voice for female speaker
`samples/demo_cfg3.5.wav`	Sample output at this cfg_scale

Backend usage

from huggingface_hub import hf_hub_download, snapshot_download
snapshot_download("MohammedEhab20/vibe-voice-egyptian-cfg35", local_dir="./model")
male_voice   = "./model/voices/egyptian_male.wav"
female_voice = "./model/voices/egyptian_female.wav"
# Run inference:
# python inference_from_file.py --model_path ./model --cfg_scale 3.5 ...

Downloads last month: 92

Safetensors

Model size

3B params

Tensor type

F16