Instructions to use Shrijanagain/TIGER-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Shrijanagain/TIGER-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Shrijanagain/TIGER-GGUF")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Shrijanagain/TIGER-GGUF", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Shrijanagain/TIGER-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Shrijanagain/TIGER-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Shrijanagain/TIGER-GGUF", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Shrijanagain/TIGER-GGUF
- SGLang
How to use Shrijanagain/TIGER-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Shrijanagain/TIGER-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Shrijanagain/TIGER-GGUF", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Shrijanagain/TIGER-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Shrijanagain/TIGER-GGUF", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Shrijanagain/TIGER-GGUF with Docker Model Runner:
docker model run hf.co/Shrijanagain/TIGER-GGUF
base_model:
- mistralai/Mistral-7B-Instruct-v0.3
tags:
- llama-cpp
license: mit
datasets:
- SKT-NRS/SKT-OMNI-CORPUS-146T-V1
language:
- en
- hi
pipeline_tag: text-generation
library_name: transformers
π SKT-OM (TIGER-OM) - Agentic RAG System
Advanced 13B Agentic RAG with Think Mode + Dynamic Plugins + LangGraph
Built for AMD Developer Hackathon 2026 on AMD Developer Cloud.
π Project Overview
SKT-OM (also known as TIGER-OM) is a powerful 13B parameter fully agentic Retrieval-Augmented Generation (RAG) system. It goes far beyond traditional RAG by integrating:
- Think Mode β Advanced multi-step reasoning engine
- Dynamic Plugin Architecture β Intelligent tool selection & execution
- LangGraph Multi-Agent Workflow β Stateful agent collaboration
- SKT RAG β High-performance retrieval pipeline
The system takes natural language queries and returns intelligent, reasoned, and accurate responses with tool usage and verification.
π Model Details
- Model Name: TIGER-OM (SKT-OM)
- Parameters: 13 Billion
- Base Model: Custom trained on AMD hardware
- Quantization: Q4_K_M (Excellent balance between quality and size)
- GGUF Format: Optimized for CPU + GPU inference
- Training Hardware: AMD Developer Cloud GPUs ($100 credits)
- Inference: ROCm 7.0 + vLLM (Full FP16) + GGUF (Q4_K_M)
Q4_K_M Version provides near FP16 level reasoning quality while being much more memory efficient and faster on consumer/pro hardware.
β¨ Key Features
- Think Mode Engine: Chain-of-Thought, Self-Reflection, Verification Loops, and Self-Critique
- Plugin Ecosystem: Code Runner, Math Solver, Web Search, Data Analyzer, Document Parser + Custom Plugins
- Advanced RAG: SKT RAG with query rewriting, multi-hop retrieval, reranking & contextual compression
- Multi-Agent System: LangGraph powered stateful workflow
- Memory: Persistent conversation state
- Tool Use: Dynamic plugin routing based on query intent
π Important Links
- Live Demo: https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/SKT-OM
- Main Model Repo: Shrijanagain/TIGER-OM
- GGUF Quantized Models (Q4_K_M): Shrijanagain/TIGER-GGUF
- GitHub Repository (RAG + ADK): https://github.com/SHRIJANAGAIN/SKT-AMD-FILES
How It Works
graph TD
A[User Query] --> B[Think Mode]
B --> C[Decomposition & Planning]
C --> D[Plugin Router]
C --> E[SKT RAG Retrieval]
D --> F[Execute Plugins]
E --> G[Context Processing]
F & G --> H[Verification Loop]
H --> I[LangGraph Synthesis]
I --> J[Final Response]
π οΈ Technologies Used
- LLM: 13B TIGER-OM (Q4_K_M GGUF)
- RAG Framework: SKT RAG + ADK Kit
- Agent Framework: LangGraph
- GPU Stack: ROCm 7.0 + AMD ADK Kit
- Inference: vLLM (FP16) + llama.cpp (GGUF Q4_K_M)
- Hardware: AMD MI300X
- Cloud: AMD Developer Cloud
π Quick Start - GGUF Q4_K_M
# Using llama.cpp
./llama-cli \
-m tiger-om-q4_k_m.gguf \
-p "Your complex query here..." \
-n 1024 \
-t 8 \
--temp 0.7
Python Example (llama-cpp-python)
from llama_cpp import Llama
llm = Llama(
model_path="tiger-om-q4_k_m.gguf",
n_gpu_layers=-1, # Use all GPU layers
n_ctx=8192,
verbose=False
)
response = llm.create_chat_completion(
messages=[{"role": "user", "content": "Explain..."}],
temperature=0.7,
max_tokens=1024
)
print(response['choices'][0]['message']['content'])
π Repository Structure
/skt_ai_labsβ Core ADK + RAG integration/pluginsβ Plugin system/agentsβ LangGraph workflows/examplesβ Ready-to-use examples/docsβ Architecture & guides
π Hackathon Information
- Event: AMD Developer Hackathon 2026
- Trained on: AMD Developer Cloud ($100 credits)
- Built in Public: Regular technical updates shared
- Goal: Showcasing powerful agentic AI on AMD ROCm ecosystem
π License
MIT