Text Generation
Transformers
Safetensors
English
Hindi
mixtral
agent
Qwen
AI
ST-X-0
MIXTRAL
TIGER OM
text-generation-inference
Instructions to use Shrijanagain/TIGER-OM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Shrijanagain/TIGER-OM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Shrijanagain/TIGER-OM")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Shrijanagain/TIGER-OM") model = AutoModelForCausalLM.from_pretrained("Shrijanagain/TIGER-OM") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Shrijanagain/TIGER-OM with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Shrijanagain/TIGER-OM" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Shrijanagain/TIGER-OM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Shrijanagain/TIGER-OM
- SGLang
How to use Shrijanagain/TIGER-OM with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Shrijanagain/TIGER-OM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Shrijanagain/TIGER-OM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Shrijanagain/TIGER-OM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Shrijanagain/TIGER-OM", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Shrijanagain/TIGER-OM with Docker Model Runner:
docker model run hf.co/Shrijanagain/TIGER-OM
File size: 4,747 Bytes
ac4423f e964992 ac4423f d788481 c33e658 d788481 ac4423f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 | ---
license: mit
language:
- en
- hi
base_model:
- mistralai/Mistral-7B-Instruct-v0.3
tags:
- agent
- Qwen
- AI
- ST-X-0
- MIXTRAL
- TIGER OM
library_name: transformers
inference:
parameters:
temperature: 0.7
max_new_tokens: 500
widget:
- text: "What are the latest trends in retrieval-augmented generation?"
example_title: "General Query"
---
---
# π TIGER-OM (SKT-OM) - 13B MoE Agentic Model
**Advanced 13B Mixture-of-Experts (MoE) Model** optimized for Agentic RAG with Think Mode & Plugin Architecture.
Built for **AMD Developer Hackathon 2026** using AMD Developer Cloud.
---
## π Model Details
- **Model Name**: TIGER-OM (SKT-OM)
- **Architecture**: **Mixture of Experts (MoE)**
- **Total Parameters**: 13B (Active parameters much lower due to MoE sparsity)
- **Base Models**:
- Primary Base: **Shrijanagain/ST-X-0**
- Expert Integration: **Mistral-7B**
- **Format**: **Safetensors** (Safe & Fast loading)
- **Quantization**: FP16 / BF16 (Original) + Q4_K_M GGUF available in separate repo
- **Context Length**: 8192 tokens
- **Training Hardware**: AMD Developer Cloud GPUs ($100 developer credits)
- **Inference Optimized**: ROCm 7.0 + vLLM + AMD MI300X
---
## π Key Features
- **True MoE Architecture** β Sparse activation for better efficiency and performance
- **Think Mode Reasoning** β Advanced Chain-of-Thought, Planning, Self-Reflection & Verification
- **Dynamic Plugin System** β Intelligent routing to Code, Math, Search, Data Analysis plugins
- **Agentic Capabilities** β Full LangGraph multi-agent workflow
- **Advanced RAG Integration** β SKT RAG + Query Rewriting + Multi-hop + Reranking
- **Stateful Memory** β Persistent conversation context
---
## ποΈ Architecture Breakdown
**TIGER-OM** is built on a **13B MoE** backbone:
- **Base**: Shrijanagain/ST-X-0 (strong foundational model)
- **Experts**: Fine-tuned using Mistral-7B as expert layers for specialized reasoning and tool-use capabilities
- **Router Network**: Learned gating mechanism for expert selection
- **Think Mode Layer**: Custom system prompt + reasoning controller
- **Plugin Head**: Tool calling & execution layer
This hybrid approach (ST-X-0 + Mistral-7B experts) gives excellent reasoning, code understanding, and general intelligence while maintaining MoE efficiency.
---
## π Files in this Repo (Safetensors)
- `model-00001-of-0000X.safetensors` β Main model weights
- `config.json`
- `tokenizer.json` / `tokenizer_config.json`
- `generation_config.json`
- `special_tokens_map.json`
- `model.safetensors.index.json`
**All weights are in safe `safetensors` format** β No pickle risk.
---
## π How to Use (Safetensors)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "Shrijanagain/TIGER-OM"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
prompt = """You are SKT-OM, an advanced agentic AI with Think Mode enabled.
User Query: Calculate training cost comparison and suggest best option..."""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=1024,
temperature=0.7,
top_p=0.9,
do_sample=True,
repetition_penalty=1.1
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---
## π Important Links
- **Live Demo**: [SKT-OM Space](https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/SKT-OM)
- **GGUF Quantized (Q4_K_M)**: [Shrijanagain/TIGER-GGUF](https://huggingface.co/Shrijanagain/TIGER-GGUF)
- **GitHub (RAG + ADK Code)**: [SHRIJANAGAIN/SKT-AMD-FILES](https://github.com/SHRIJANAGAIN/SKT-AMD-FILES)
---
## π οΈ Technologies & Stack
- **Base Models**: Shrijanagain/ST-X-0 + Mistral-7B Experts
- **RAG**: SKT RAG + AMD ADK Kit
- **Agents**: LangGraph
- **Hardware**: AMD MI300X + ROCm 7.0
- **Inference**: vLLM (FP16) + transformers (Safetensors)
- **Training**: AMD Developer Cloud
---
## β‘ Performance
- Excellent balance of **quality vs efficiency** due to MoE architecture
- Strong performance on reasoning, tool-use, code, and multi-step tasks
- Significantly lower inference cost compared to dense 13B+ models
---
## π Use Cases
- Complex technical Q&A
- Agentic workflows & tool calling
- Research assistance
- Code generation & debugging
- Mathematical & logical reasoning
- Comparative analysis
- Data analysis with plugins
---
## π Hackathon
**AMD Developer Hackathon 2026**
Trained entirely on **AMD Developer Cloud**
Fully built in public with multiple technical updates.
---
## π License
MIT License
--- |