Instructions to use Shrijanagain/TIGER-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Shrijanagain/TIGER-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Shrijanagain/TIGER-GGUF")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Shrijanagain/TIGER-GGUF", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Shrijanagain/TIGER-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Shrijanagain/TIGER-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Shrijanagain/TIGER-GGUF", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Shrijanagain/TIGER-GGUF
- SGLang
How to use Shrijanagain/TIGER-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Shrijanagain/TIGER-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Shrijanagain/TIGER-GGUF", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Shrijanagain/TIGER-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Shrijanagain/TIGER-GGUF", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Shrijanagain/TIGER-GGUF with Docker Model Runner:
docker model run hf.co/Shrijanagain/TIGER-GGUF
| base_model: | |
| - mistralai/Mistral-7B-Instruct-v0.3 | |
| tags: | |
| - llama-cpp | |
| license: mit | |
| datasets: | |
| - SKT-NRS/SKT-OMNI-CORPUS-146T-V1 | |
| language: | |
| - en | |
| - hi | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| # π SKT-OM (TIGER-OM) - Agentic RAG System | |
| **Advanced 13B Agentic RAG with Think Mode + Dynamic Plugins + LangGraph** | |
| Built for **AMD Developer Hackathon 2026** on AMD Developer Cloud. | |
| --- | |
| ## π Project Overview | |
| **SKT-OM** (also known as **TIGER-OM**) is a powerful **13B parameter fully agentic Retrieval-Augmented Generation (RAG)** system. It goes far beyond traditional RAG by integrating: | |
| - **Think Mode** β Advanced multi-step reasoning engine | |
| - **Dynamic Plugin Architecture** β Intelligent tool selection & execution | |
| - **LangGraph Multi-Agent Workflow** β Stateful agent collaboration | |
| - **SKT RAG** β High-performance retrieval pipeline | |
| The system takes natural language queries and returns intelligent, reasoned, and accurate responses with tool usage and verification. | |
| --- | |
| ## π Model Details | |
| - **Model Name**: TIGER-OM (SKT-OM) | |
| - **Parameters**: 13 Billion | |
| - **Base Model**: Custom trained on AMD hardware | |
| - **Quantization**: **Q4_K_M** (Excellent balance between quality and size) | |
| - **GGUF Format**: Optimized for CPU + GPU inference | |
| - **Training Hardware**: AMD Developer Cloud GPUs ($100 credits) | |
| - **Inference**: ROCm 7.0 + vLLM (Full FP16) + GGUF (Q4_K_M) | |
| **Q4_K_M Version** provides near FP16 level reasoning quality while being much more memory efficient and faster on consumer/pro hardware. | |
| --- | |
| ## β¨ Key Features | |
| - **Think Mode Engine**: Chain-of-Thought, Self-Reflection, Verification Loops, and Self-Critique | |
| - **Plugin Ecosystem**: Code Runner, Math Solver, Web Search, Data Analyzer, Document Parser + Custom Plugins | |
| - **Advanced RAG**: SKT RAG with query rewriting, multi-hop retrieval, reranking & contextual compression | |
| - **Multi-Agent System**: LangGraph powered stateful workflow | |
| - **Memory**: Persistent conversation state | |
| - **Tool Use**: Dynamic plugin routing based on query intent | |
| --- | |
| ## π Important Links | |
| - **Live Demo**: [https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/SKT-OM](https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/SKT-OM) | |
| - **Main Model Repo**: [Shrijanagain/TIGER-OM](https://huggingface.co/Shrijanagain/TIGER-OM) | |
| - **GGUF Quantized Models (Q4_K_M)**: [Shrijanagain/TIGER-GGUF](https://huggingface.co/Shrijanagain/TIGER-GGUF) | |
| - **GitHub Repository (RAG + ADK)**: [https://github.com/SHRIJANAGAIN/SKT-AMD-FILES](https://github.com/SHRIJANAGAIN/SKT-AMD-FILES) | |
| --- | |
| ## How It Works | |
| ```mermaid | |
| graph TD | |
| A[User Query] --> B[Think Mode] | |
| B --> C[Decomposition & Planning] | |
| C --> D[Plugin Router] | |
| C --> E[SKT RAG Retrieval] | |
| D --> F[Execute Plugins] | |
| E --> G[Context Processing] | |
| F & G --> H[Verification Loop] | |
| H --> I[LangGraph Synthesis] | |
| I --> J[Final Response] | |
| ``` | |
| --- | |
| ## π οΈ Technologies Used | |
| - **LLM**: 13B TIGER-OM (Q4_K_M GGUF) | |
| - **RAG Framework**: SKT RAG + ADK Kit | |
| - **Agent Framework**: LangGraph | |
| - **GPU Stack**: ROCm 7.0 + AMD ADK Kit | |
| - **Inference**: vLLM (FP16) + llama.cpp (GGUF Q4_K_M) | |
| - **Hardware**: AMD MI300X | |
| - **Cloud**: AMD Developer Cloud | |
| --- | |
| ## π Quick Start - GGUF Q4_K_M | |
| ```bash | |
| # Using llama.cpp | |
| ./llama-cli \ | |
| -m tiger-om-q4_k_m.gguf \ | |
| -p "Your complex query here..." \ | |
| -n 1024 \ | |
| -t 8 \ | |
| --temp 0.7 | |
| ``` | |
| **Python Example (llama-cpp-python)** | |
| ```python | |
| from llama_cpp import Llama | |
| llm = Llama( | |
| model_path="tiger-om-q4_k_m.gguf", | |
| n_gpu_layers=-1, # Use all GPU layers | |
| n_ctx=8192, | |
| verbose=False | |
| ) | |
| response = llm.create_chat_completion( | |
| messages=[{"role": "user", "content": "Explain..."}], | |
| temperature=0.7, | |
| max_tokens=1024 | |
| ) | |
| print(response['choices'][0]['message']['content']) | |
| ``` | |
| --- | |
| ## π Repository Structure | |
| - `/skt_ai_labs` β Core ADK + RAG integration | |
| - `/plugins` β Plugin system | |
| - `/agents` β LangGraph workflows | |
| - `/examples` β Ready-to-use examples | |
| - `/docs` β Architecture & guides | |
| --- | |
| ## π Hackathon Information | |
| - **Event**: AMD Developer Hackathon 2026 | |
| - **Trained on**: AMD Developer Cloud ($100 credits) | |
| - **Built in Public**: Regular technical updates shared | |
| - **Goal**: Showcasing powerful agentic AI on AMD ROCm ecosystem | |
| --- | |
| ## π License | |
| *MIT* |