TIGER-GGUF / README.md

Shrijanagain

Update README.md

fb69690 verified 12 days ago

preview code

raw

history blame contribute delete

4.32 kB

metadata

base_model:
  - mistralai/Mistral-7B-Instruct-v0.3
tags:
  - llama-cpp
license: mit
datasets:
  - SKT-NRS/SKT-OMNI-CORPUS-146T-V1
language:
  - en
  - hi
pipeline_tag: text-generation
library_name: transformers

🚀 SKT-OM (TIGER-OM) - Agentic RAG System

Advanced 13B Agentic RAG with Think Mode + Dynamic Plugins + LangGraph

Built for AMD Developer Hackathon 2026 on AMD Developer Cloud.

🌟 Project Overview

SKT-OM (also known as TIGER-OM) is a powerful 13B parameter fully agentic Retrieval-Augmented Generation (RAG) system. It goes far beyond traditional RAG by integrating:

Think Mode — Advanced multi-step reasoning engine
Dynamic Plugin Architecture — Intelligent tool selection & execution
LangGraph Multi-Agent Workflow — Stateful agent collaboration
SKT RAG — High-performance retrieval pipeline

The system takes natural language queries and returns intelligent, reasoned, and accurate responses with tool usage and verification.

📊 Model Details

Model Name: TIGER-OM (SKT-OM)
Parameters: 13 Billion
Base Model: Custom trained on AMD hardware
Quantization: Q4_K_M (Excellent balance between quality and size)
GGUF Format: Optimized for CPU + GPU inference
Training Hardware: AMD Developer Cloud GPUs ($100 credits)
Inference: ROCm 7.0 + vLLM (Full FP16) + GGUF (Q4_K_M)

Q4_K_M Version provides near FP16 level reasoning quality while being much more memory efficient and faster on consumer/pro hardware.

✨ Key Features

Think Mode Engine: Chain-of-Thought, Self-Reflection, Verification Loops, and Self-Critique
Plugin Ecosystem: Code Runner, Math Solver, Web Search, Data Analyzer, Document Parser + Custom Plugins
Advanced RAG: SKT RAG with query rewriting, multi-hop retrieval, reranking & contextual compression
Multi-Agent System: LangGraph powered stateful workflow
Memory: Persistent conversation state
Tool Use: Dynamic plugin routing based on query intent

🔗 Important Links

Live Demo: https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/SKT-OM
Main Model Repo: Shrijanagain/TIGER-OM
GGUF Quantized Models (Q4_K_M): Shrijanagain/TIGER-GGUF
GitHub Repository (RAG + ADK): https://github.com/SHRIJANAGAIN/SKT-AMD-FILES

How It Works

graph TD
    A[User Query] --> B[Think Mode]
    B --> C[Decomposition & Planning]
    C --> D[Plugin Router]
    C --> E[SKT RAG Retrieval]
    D --> F[Execute Plugins]
    E --> G[Context Processing]
    F & G --> H[Verification Loop]
    H --> I[LangGraph Synthesis]
    I --> J[Final Response]

🛠️ Technologies Used

LLM: 13B TIGER-OM (Q4_K_M GGUF)
RAG Framework: SKT RAG + ADK Kit
Agent Framework: LangGraph
GPU Stack: ROCm 7.0 + AMD ADK Kit
Inference: vLLM (FP16) + llama.cpp (GGUF Q4_K_M)
Hardware: AMD MI300X
Cloud: AMD Developer Cloud

🚀 Quick Start - GGUF Q4_K_M

# Using llama.cpp
./llama-cli \
  -m tiger-om-q4_k_m.gguf \
  -p "Your complex query here..." \
  -n 1024 \
  -t 8 \
  --temp 0.7

Python Example (llama-cpp-python)

from llama_cpp import Llama

llm = Llama(
    model_path="tiger-om-q4_k_m.gguf",
    n_gpu_layers=-1,      # Use all GPU layers
    n_ctx=8192,
    verbose=False
)

response = llm.create_chat_completion(
    messages=[{"role": "user", "content": "Explain..."}],
    temperature=0.7,
    max_tokens=1024
)

print(response['choices'][0]['message']['content'])

📁 Repository Structure

/skt_ai_labs — Core ADK + RAG integration
/plugins — Plugin system
/agents — LangGraph workflows
/examples — Ready-to-use examples
/docs — Architecture & guides

🏆 Hackathon Information

Event: AMD Developer Hackathon 2026
Trained on: AMD Developer Cloud ($100 credits)
Built in Public: Regular technical updates shared
Goal: Showcasing powerful agentic AI on AMD ROCm ecosystem

📄 License

MIT