Dhee-NxtGen-Qwen3-Malayalam-v2

Model Description

Dhee-NxtGen-Qwen3-Malayalam-v2 is a large language model designed for natural and fluent Malayalam understanding and text generation.
It is built upon the Qwen3 architecture and optimized for assistant-style dialogue, function-calling reasoning, and multi-turn conversations.

This model is part of DheeYantra’s multilingual LLM initiative, developed in collaboration with NxtGen Cloud Technologies Private Limited, to advance Indic conversational AI systems.

Key Features

Context-aware Malayalam text generation
Optimized for reasoning and function-calling use cases
Suitable for dialogue systems, summarization, and open-domain conversations
Fully compatible with 🤗 Hugging Face Transformers
Optimized for VLLM deployment for high-performance serving

Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "dheeyantra/dhee-nxtgen-qwen3-malayalam-v2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

# Example prompt
prompt = """<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
നിങ്ങൾക്ക് എനിക്ക് ഒരു അപ്പോയിന്റ്മെന്റ് ഷെഡ്യൂൾ ചെയ്ത് തരാമോ?<|im_end|>
<|im_start|>assistant
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Uses & Limitations

Intended Uses

Malayalam conversational chatbots and assistants
Function-calling and structured response generation
Story generation and summarization in Malayalam
Natural dialogue systems for Indic AI applications

Limitations

May generate inaccurate or biased responses in rare cases
Performance can vary on out-of-domain or code-mixed inputs
Primarily optimized for Malayalam; other languages may produce less fluent results

VLLM / High-Performance Serving Requirements

For high-throughput serving with vLLM, ensure the following environment:

GPU with compute capability ≥ 8.0 (e.g., NVIDIA A100)
PyTorch 2.1+ and CUDA toolkit installed
For V100 GPUs (sm70), vLLM GPU inference is not supported; CPU fallback is possible but slower.

Install dependencies:

pip install torch transformers vllm sentencepiece

Run vLLM server:

vllm serve   --model dheeyantra/dhee-nxtgen-qwen3-malayalam-v2   --host 0.0.0.0   --port 8000

License

Released under the Apache 2.0 License.

Developed by DheeYantra in collaboration with NxtGen Cloud Technologies Pvt. Ltd.

Downloads last month: 3

Safetensors

Model size

2B params

Tensor type

F16

Collection including dheeyantra/dhee-nxtgen-qwen3-malayalam-v2

Dhee-NxtGen-Qwen3-2B-v2

Collection

Dhee-NxtGen-Qwen3-1.7B-v2 is a multilingual LLM series by DheeYantra and NxtGen Cloud Technologies, based on Qwen3-1.7B and built for Indian languages • 14 items • Updated Mar 2 • 2