Shrijanagain commited on
Commit
fb69690
Β·
verified Β·
1 Parent(s): ccd2d40

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +137 -32
README.md CHANGED
@@ -1,50 +1,155 @@
1
  ---
2
- base_model: Shrijanagain/TIGER-PASS-V1-ARCHIVE
 
3
  tags:
4
  - llama-cpp
5
- - gguf-my-repo
 
 
 
 
 
 
 
6
  ---
 
7
 
8
- # Shrijanagain/TIGER-PASS-V1-ARCHIVE-Q4_K_M-GGUF
9
- This model was converted to GGUF format from [`Shrijanagain/TIGER-PASS-V1-ARCHIVE`](https://huggingface.co/Shrijanagain/TIGER-PASS-V1-ARCHIVE) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
10
- Refer to the [original model card](https://huggingface.co/Shrijanagain/TIGER-PASS-V1-ARCHIVE) for more details on the model.
11
 
12
- ## Use with llama.cpp
13
- Install llama.cpp through brew (works on Mac and Linux)
14
 
15
- ```bash
16
- brew install llama.cpp
17
 
18
- ```
19
- Invoke the llama.cpp server or the CLI.
20
 
21
- ### CLI:
22
- ```bash
23
- llama-cli --hf-repo Shrijanagain/TIGER-PASS-V1-ARCHIVE-Q4_K_M-GGUF --hf-file tiger-pass-v1-archive-q4_k_m.gguf -p "The meaning to life and the universe is"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  ```
25
 
26
- ### Server:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  ```bash
28
- llama-server --hf-repo Shrijanagain/TIGER-PASS-V1-ARCHIVE-Q4_K_M-GGUF --hf-file tiger-pass-v1-archive-q4_k_m.gguf -c 2048
 
 
 
 
 
 
29
  ```
30
 
31
- Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
32
 
33
- Step 1: Clone llama.cpp from GitHub.
34
- ```
35
- git clone https://github.com/ggerganov/llama.cpp
36
- ```
37
 
38
- Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
39
- ```
40
- cd llama.cpp && LLAMA_CURL=1 make
41
- ```
 
 
42
 
43
- Step 3: Run inference through the main binary.
44
- ```
45
- ./llama-cli --hf-repo Shrijanagain/TIGER-PASS-V1-ARCHIVE-Q4_K_M-GGUF --hf-file tiger-pass-v1-archive-q4_k_m.gguf -p "The meaning to life and the universe is"
46
- ```
47
- or
48
- ```
49
- ./llama-server --hf-repo Shrijanagain/TIGER-PASS-V1-ARCHIVE-Q4_K_M-GGUF --hf-file tiger-pass-v1-archive-q4_k_m.gguf -c 2048
50
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - mistralai/Mistral-7B-Instruct-v0.3
4
  tags:
5
  - llama-cpp
6
+ license: mit
7
+ datasets:
8
+ - SKT-NRS/SKT-OMNI-CORPUS-146T-V1
9
+ language:
10
+ - en
11
+ - hi
12
+ pipeline_tag: text-generation
13
+ library_name: transformers
14
  ---
15
+ # πŸš€ SKT-OM (TIGER-OM) - Agentic RAG System
16
 
17
+ **Advanced 13B Agentic RAG with Think Mode + Dynamic Plugins + LangGraph**
 
 
18
 
19
+ Built for **AMD Developer Hackathon 2026** on AMD Developer Cloud.
 
20
 
21
+ ---
 
22
 
23
+ ## 🌟 Project Overview
 
24
 
25
+ **SKT-OM** (also known as **TIGER-OM**) is a powerful **13B parameter fully agentic Retrieval-Augmented Generation (RAG)** system. It goes far beyond traditional RAG by integrating:
26
+
27
+ - **Think Mode** β€” Advanced multi-step reasoning engine
28
+ - **Dynamic Plugin Architecture** β€” Intelligent tool selection & execution
29
+ - **LangGraph Multi-Agent Workflow** β€” Stateful agent collaboration
30
+ - **SKT RAG** β€” High-performance retrieval pipeline
31
+
32
+ The system takes natural language queries and returns intelligent, reasoned, and accurate responses with tool usage and verification.
33
+
34
+ ---
35
+
36
+ ## πŸ“Š Model Details
37
+
38
+ - **Model Name**: TIGER-OM (SKT-OM)
39
+ - **Parameters**: 13 Billion
40
+ - **Base Model**: Custom trained on AMD hardware
41
+ - **Quantization**: **Q4_K_M** (Excellent balance between quality and size)
42
+ - **GGUF Format**: Optimized for CPU + GPU inference
43
+ - **Training Hardware**: AMD Developer Cloud GPUs ($100 credits)
44
+ - **Inference**: ROCm 7.0 + vLLM (Full FP16) + GGUF (Q4_K_M)
45
+
46
+ **Q4_K_M Version** provides near FP16 level reasoning quality while being much more memory efficient and faster on consumer/pro hardware.
47
+
48
+ ---
49
+
50
+ ## ✨ Key Features
51
+
52
+ - **Think Mode Engine**: Chain-of-Thought, Self-Reflection, Verification Loops, and Self-Critique
53
+ - **Plugin Ecosystem**: Code Runner, Math Solver, Web Search, Data Analyzer, Document Parser + Custom Plugins
54
+ - **Advanced RAG**: SKT RAG with query rewriting, multi-hop retrieval, reranking & contextual compression
55
+ - **Multi-Agent System**: LangGraph powered stateful workflow
56
+ - **Memory**: Persistent conversation state
57
+ - **Tool Use**: Dynamic plugin routing based on query intent
58
+
59
+ ---
60
+
61
+ ## πŸ”— Important Links
62
+
63
+ - **Live Demo**: [https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/SKT-OM](https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/SKT-OM)
64
+ - **Main Model Repo**: [Shrijanagain/TIGER-OM](https://huggingface.co/Shrijanagain/TIGER-OM)
65
+ - **GGUF Quantized Models (Q4_K_M)**: [Shrijanagain/TIGER-GGUF](https://huggingface.co/Shrijanagain/TIGER-GGUF)
66
+ - **GitHub Repository (RAG + ADK)**: [https://github.com/SHRIJANAGAIN/SKT-AMD-FILES](https://github.com/SHRIJANAGAIN/SKT-AMD-FILES)
67
+
68
+ ---
69
+
70
+ ## How It Works
71
+
72
+ ```mermaid
73
+ graph TD
74
+ A[User Query] --> B[Think Mode]
75
+ B --> C[Decomposition & Planning]
76
+ C --> D[Plugin Router]
77
+ C --> E[SKT RAG Retrieval]
78
+ D --> F[Execute Plugins]
79
+ E --> G[Context Processing]
80
+ F & G --> H[Verification Loop]
81
+ H --> I[LangGraph Synthesis]
82
+ I --> J[Final Response]
83
  ```
84
 
85
+ ---
86
+
87
+ ## πŸ› οΈ Technologies Used
88
+
89
+ - **LLM**: 13B TIGER-OM (Q4_K_M GGUF)
90
+ - **RAG Framework**: SKT RAG + ADK Kit
91
+ - **Agent Framework**: LangGraph
92
+ - **GPU Stack**: ROCm 7.0 + AMD ADK Kit
93
+ - **Inference**: vLLM (FP16) + llama.cpp (GGUF Q4_K_M)
94
+ - **Hardware**: AMD MI300X
95
+ - **Cloud**: AMD Developer Cloud
96
+
97
+ ---
98
+
99
+ ## πŸš€ Quick Start - GGUF Q4_K_M
100
+
101
  ```bash
102
+ # Using llama.cpp
103
+ ./llama-cli \
104
+ -m tiger-om-q4_k_m.gguf \
105
+ -p "Your complex query here..." \
106
+ -n 1024 \
107
+ -t 8 \
108
+ --temp 0.7
109
  ```
110
 
111
+ **Python Example (llama-cpp-python)**
112
 
113
+ ```python
114
+ from llama_cpp import Llama
 
 
115
 
116
+ llm = Llama(
117
+ model_path="tiger-om-q4_k_m.gguf",
118
+ n_gpu_layers=-1, # Use all GPU layers
119
+ n_ctx=8192,
120
+ verbose=False
121
+ )
122
 
123
+ response = llm.create_chat_completion(
124
+ messages=[{"role": "user", "content": "Explain..."}],
125
+ temperature=0.7,
126
+ max_tokens=1024
127
+ )
128
+
129
+ print(response['choices'][0]['message']['content'])
130
  ```
131
+
132
+ ---
133
+
134
+ ## πŸ“ Repository Structure
135
+
136
+ - `/skt_ai_labs` β€” Core ADK + RAG integration
137
+ - `/plugins` β€” Plugin system
138
+ - `/agents` β€” LangGraph workflows
139
+ - `/examples` β€” Ready-to-use examples
140
+ - `/docs` β€” Architecture & guides
141
+
142
+ ---
143
+
144
+ ## πŸ† Hackathon Information
145
+
146
+ - **Event**: AMD Developer Hackathon 2026
147
+ - **Trained on**: AMD Developer Cloud ($100 credits)
148
+ - **Built in Public**: Regular technical updates shared
149
+ - **Goal**: Showcasing powerful agentic AI on AMD ROCm ecosystem
150
+
151
+ ---
152
+
153
+ ## πŸ“„ License
154
+
155
+ *MIT*