MridulNegi2005 commited on
Commit
06c14d8
·
1 Parent(s): 56a4b39

docs: update README with links - Kaggle notebook, HuggingFace demo, GitHub profiles

Browse files
Files changed (1) hide show
  1. README.md +251 -124
README.md CHANGED
@@ -1,193 +1,320 @@
1
  ---
2
- title: Project Mahoraga
3
  emoji: ⚔️
4
  colorFrom: red
5
  colorTo: gray
6
  sdk: docker
7
  app_port: 7860
 
 
 
 
 
 
 
 
8
  ---
9
- # ⚔️ Project Mahoraga — Adaptation Engine
10
 
11
- An RL-based combat AI that learns to fight like Mahoraga from JJK — observing enemy attack patterns, adapting its resistances in real-time, and executing devastating Judgment Strikes when stacks are built.
12
 
13
- **Trained on Qwen 2.5 3B (LoRA)** using reward-weighted SFT with a custom curriculum-based enemy across 3 phases.
14
 
15
- ![Aero-Tactical Dashboard](docs/dashboard_preview.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  ---
18
 
19
- ## 🚀 Quick Start
20
 
21
- ### Frontend Dashboard (Recommended)
22
 
23
- ```bash
24
- # Terminal 1: Start the API server
25
- python api.py
26
- # → FastAPI on http://localhost:8000
27
-
28
- # Terminal 2: Start the React dashboard
29
- cd frontend
30
- npm install # first time only
31
- npm run dev
32
- # → Dashboard on http://localhost:5173
33
- ```
34
 
35
- ### Gradio UI (Lightweight Alternative)
36
 
37
- ```bash
38
- python app.py
39
- # Gradio on http://localhost:7860
40
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  ---
43
 
44
- ## 🎮 Features
 
 
45
 
46
- ### Aero-Tactical Dashboard
47
- - **Viewport-locked bento-grid** layout — no scrolling, pure tactical overview
48
- - **Animated HP / Resistance bars** with spring physics (Framer Motion)
49
- - **Golden Mahoraga Wheel** — rotates 45° per adaptation, 180° on Judgment Strike
50
- - **Screen shake** on heavy hits, full-screen flash on Judgment Strike
51
- - **Combat log** with timeline-style entries and color-coded attack categories
52
 
53
- ### Difficulty Levels
54
 
55
- | Level | Enemy Behavior | Color |
56
- |-------|---------------|-------|
57
- | **EASY** | Always PHYSICAL attacks (Phase I only) | 🟢 Green |
58
- | **MEDIUM** | Cycling attacks, no adaptive targeting | 🟡 Amber |
59
- | **HARD** | Full 3-phase AI — targets your weakest resistance | 🔴 Red |
60
 
61
- ### LLM Auto-Play
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
- Click **▶ LLM AUTO** to let the trained Qwen 2.5 3B model fight autonomously:
64
- - Model loads on first click (~30-60s, uses ~2.5GB VRAM)
65
- - Plays one turn every 1.2s with full animations
66
- - Falls back to a smart rule-based agent if GPU unavailable
67
 
68
- ### Color-Coded Attack Categories
69
 
70
- | Category | Color | Subtypes |
71
- |----------|-------|----------|
72
- | **PHYSICAL** | 🟠 Orange | SLASH, IMPACT, PIERCE |
73
- | **CE** (Cursed Energy) | 🟣 Purple | BLAST, WAVE, BEAM |
74
- | **TECHNIQUE** | 🔵 Teal | SPIKE, DELAYED, PATTERN |
 
 
75
 
76
  ---
77
 
78
- ## 🏗️ Architecture
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
 
80
  ```
81
- Mahoraga/
82
- ├── api.py # FastAPI server (REST endpoints + LLM inference)
83
- ├── app.py # Gradio UI (standalone alternative)
84
- ├── env/
85
- │ ├── mahoraga_env.py # Main RL environment (MahoragaEnv)
86
- │ ├── enemy.py # 3-phase curriculum enemy (CurriculumEnemy)
87
- │ ├── mechanics.py # Combat math (damage, resistances, judgment)
88
- │ └── rewards.py # Reward functions (7 components)
89
- ├── utils/
90
- │ ├── constants.py # Game constants (HP, damage, categories)
91
- │ └── validators.py # Action validation
92
- ├── frontend/ # React dashboard
93
- │ ├── src/App.jsx # Main UI (727 lines)
94
- │ ├── src/index.css # Design system (glass panels, animations)
95
- │ └── vite.config.js # Vite + proxy to FastAPI
96
- ├── mahoraga_loral_final/ # Trained LoRA weights (not in git)
97
- │ ├── adapter_config.json
98
- │ ├── adapter_model.safetensors
99
- │ └── tokenizer*.json
100
- └── notebooks/
101
- └── mahoraga_training.py # Kaggle training notebook
102
  ```
103
 
 
 
 
 
 
 
104
  ---
105
 
106
- ## 🧠 How The AI Works
 
 
 
 
 
 
 
 
107
 
108
- ### Environment (MahoragaEnv)
109
 
110
- Turn-based combat where Mahoraga has 5 actions:
111
- - **0-2:** Adapt resistance (Physical / CE / Technique) — +40 to target, -20 to others
112
- - **3:** Judgment Strike — base 350 DMG + 50 per adaptation stack (resets stacks)
113
- - **4:** Regeneration — heal 300 HP (3-turn cooldown)
114
 
115
- ### Curriculum Enemy (3 Phases)
116
 
117
- | Phase | Turns | Behavior |
118
- |-------|-------|----------|
119
- | I Tutorial | 1-5 | Always PHYSICAL |
120
- | II — Pattern | 6-15 | Cycles PHYSICAL CE TECHNIQUE (15% random deviation) |
121
- | III — Adaptive | 16+ | Targets the agent's **lowest resistance** |
 
 
 
 
 
 
 
122
 
123
- ### Reward Signal (7 components)
124
 
125
- | Component | Signal |
126
- |-----------|--------|
127
- | Survival | `-damage_taken / 100` |
128
- | Combat | `+damage_dealt / 80` |
129
- | Adaptation | `+0.8` for correct match |
130
- | Anti-cowardice | `-1.0` for healing at high HP |
131
- | Efficiency | `+1.0` for burst damage ≥200 |
132
- | Terminal | `+10` win / `-8` loss |
133
- | Opportunity | `-0.5` for not attacking at stack ≥2 |
134
 
135
- ### Training
136
 
137
- - **Model:** Qwen 2.5 3B Instruct (4-bit quantized via Unsloth)
138
- - **Method:** LoRA (r=16, α=16) targeting q/k/v/o projections
139
- - **Algorithm:** Reward-weighted SFT with episode-level modifiers + expert trajectory seeding
140
- - **Platform:** Kaggle (T4 GPU)
 
 
 
 
 
141
 
142
  ---
143
 
144
- ## 🔌 API Reference
 
 
 
 
 
 
 
 
145
 
146
- | Method | Endpoint | Body | Description |
147
- |--------|----------|------|-------------|
148
- | `POST` | `/api/reset` | `{ "difficulty": "easy"\|"medium"\|"hard" }` | Reset environment |
149
- | `POST` | `/api/step` | `{ "action": 0-4 }` | Execute one manual turn |
150
- | `POST` | `/api/auto-step` | | LLM picks the action |
151
- | `GET` | `/api/model-status` | — | Check if LLM is loaded |
 
 
 
 
 
 
 
 
 
 
152
 
153
  ---
154
 
155
- ## 📦 Dependencies
156
 
157
- ### Backend
158
  ```
159
- fastapi uvicorn pydantic # API server
160
- torch transformers peft # LLM inference
161
- bitsandbytes accelerate # 4-bit quantization
162
- unsloth # Optional: faster inference
163
- gradio # Alternative UI
 
 
 
 
164
  ```
165
 
166
- ### Frontend
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
167
  ```
168
- react framer-motion # UI + animations
169
- tailwindcss @tailwindcss/vite # Styling
170
- vite # Build tool
 
 
171
  ```
172
 
173
- ---
174
 
175
- ## 🖥️ Hardware Requirements
 
 
 
176
 
177
- | Component | Minimum | Recommended |
178
- |-----------|---------|-------------|
179
- | GPU (for LLM) | GTX 1650 (4GB) | RTX 3060 (12GB) |
180
- | RAM | 8 GB | 16 GB |
181
- | Storage | 500 MB (+ ~2GB model) | Same |
182
 
183
- > **No GPU?** The dashboard works fully in manual mode. Auto-play falls back to a rule-based agent.
 
 
 
 
184
 
185
  ---
186
 
187
- ## 👥 Team
188
 
189
- Built by **Atishay** — [GitHub](https://github.com/Atishay9828)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
 
191
  ---
192
 
 
 
 
 
 
 
193
  *"The more it is hit, the more it adapts. That is the nature of Mahoraga."*
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: "Mahoraga — Adaptive Combat RL Environment"
3
  emoji: ⚔️
4
  colorFrom: red
5
  colorTo: gray
6
  sdk: docker
7
  app_port: 7860
8
+ tags:
9
+ - reinforcement-learning
10
+ - openenv
11
+ - llm-agent
12
+ - adaptive-ai
13
+ - grpo
14
+ - unsloth
15
+ - trl
16
  ---
 
17
 
18
+ <div align="center">
19
 
20
+ <img src="docs/mahoraga_wheel.svg" alt="Mahoraga Wheel" width="200"/>
21
 
22
+ # ⚔️ DIVINE GENERAL MAHORAGA
23
+
24
+ ### A boss that learns how you fight. Then makes sure it never works again.
25
+
26
+ <br>
27
+
28
+ **Meta OpenEnv Hackathon 2026** · OpenEnv + TRL + Unsloth
29
+
30
+ [![Adapts](https://img.shields.io/badge/Adapts_to-Everything-crimson?style=for-the-badge)]()
31
+ [![Reward](https://img.shields.io/badge/Avg_Reward-18.55-blue?style=for-the-badge)]()
32
+ [![Survived](https://img.shields.io/badge/Players_Who_Survived-Good_Luck-black?style=for-the-badge)]()
33
+
34
+ 📓 [**Training Notebook**](https://www.kaggle.com/code/atishay9828/meta-mahoraga/edit) · 🤗 [**Live Demo**](https://huggingface.co/spaces/MridulNegi2005/Project-Mahoraga) · 🏠 [**GitHub**](https://github.com/MridulNegi2005/Project_Mahoraga)
35
+
36
+ </div>
37
 
38
  ---
39
 
40
+ ## 🎮 Let's Be Honest — Boss Fights Are Broken
41
 
42
+ Every game. Every RPG. Every "epic final boss."
43
 
44
+ You die once. You learn the pattern. You spam the same combo. Boss dead. GG.
 
 
 
 
 
 
 
 
 
 
45
 
46
+ **Dodge Hit Dodge → Hit → Dodge → Hit.**
47
 
48
+ That's not strategy. That's muscle memory with extra steps. The boss doesn't learn. The boss doesn't care that you've been spamming the same fire spell for the last 12 turns. It just stands there and takes it.
49
+
50
+ We've been fighting NPCs that are *literally designed to lose.*
51
+
52
+ ---
53
+
54
+ **Now imagine a boss that watches you.**
55
+
56
+ Every move you make? Noted. Every attack you spam? Countered. You found a winning strategy? Congrats — it worked once. Try it again and you'll hit a wall of resistance so thick your damage might as well be a gentle breeze.
57
+
58
+ **That's Mahoraga.**
59
+
60
+ Straight from the Jujutsu Kaisen universe — the shikigami that *nobody has ever tamed.* Not because it's the strongest. Because it **adapts to anything you throw at it.**
61
+
62
+ You slash it? It builds resistance to slashing.
63
+ You blast it with cursed energy? It learns to tank that too.
64
+ You try the same thing twice? That's cute. It already adapted.
65
+
66
+ We took that concept and turned it into a real, trainable RL environment — powered by an LLM that actually *learns* to be this terrifying.
67
 
68
  ---
69
 
70
+ ## ⚔️ How Mahoraga Hunts You
71
+
72
+ Mahoraga isn't a scripted boss. It's an LLM (Qwen 2.5 3B) fine-tuned through reinforcement learning to make tactical combat decisions in real time.
73
 
74
+ ### The Resistance Engine
 
 
 
 
 
75
 
76
+ This is the core. This is what makes Mahoraga... *Mahoraga.*
77
 
78
+ Every time the agent observes your attack pattern, it builds resistance:
 
 
 
 
79
 
80
+ ```
81
+ You attack PHYSICAL → Mahoraga adapts → +40% Physical Resistance
82
+ You attack PHYSICAL → Mahoraga adapts → +80% Physical Resistance (CAPPED)
83
+ Your PHYSICAL damage? Basically zero now.
84
+
85
+ You switch to CE? → Mahoraga's watching. It'll catch on.
86
+ ```
87
+
88
+ Resistance isn't free, though. Building one defense weakens the others (−20% to non-adapted types). Mahoraga has to *choose* what to defend against — and if it reads you wrong, that's your opening.
89
+
90
+ **But here's the thing: it almost never reads you wrong anymore.**
91
+
92
+ ### The Judgment Strike 💀
93
+
94
+ When Mahoraga has stacked enough correct adaptations, it unleashes **Judgment Strike** — a devastating burst that can deal **350 + 50 per stack** damage in a single turn.
95
+
96
+ The catch? Judgment Strike resets everything. All resistances. All stacks. Back to zero.
97
+
98
+ So Mahoraga has to decide: *Do I keep building defenses, or do I cash in right now for a massive hit?*
99
+
100
+ The trained model learned the answer. It waits. It stacks. It times. Then it deletes you in one move.
101
 
102
+ ### The 3-Phase Escalation
 
 
 
103
 
104
+ Mahoraga doesn't start at full power. It *wakes up.*
105
 
106
+ | Phase | Turns | What You're Facing |
107
+ |-------|-------|-------------------|
108
+ | 🟢 **Awakening** | 1–5 | Predictable patterns. You think you've got this. |
109
+ | 🟡 **Reading** | 6–15 | Cycling attack types. 15% random deviation. It's testing you. |
110
+ | 🔴 **Hunting** | 16+ | It reads your weakest resistance and **targets it directly.** |
111
+
112
+ By Phase 3, Mahoraga isn't reacting anymore. It's *predicting.* It looks at your defenses, finds the gap, and exploits it. Every. Single. Turn.
113
 
114
  ---
115
 
116
+ ## 🧠 What Mahoraga Learned (On Its Own)
117
+
118
+ We didn't program a strategy. We didn't hardcode "adapt then strike." We gave it a reward signal and an environment that punishes repetition.
119
+
120
+ Here's what emerged:
121
+
122
+ ### Early Training: A Confused Shikigami
123
+
124
+ Iteration 1 Mahoraga is... sad. It picks random actions. It heals when it's at full HP. It attacks without building stacks. It dies to its own inefficiency.
125
+
126
+ **Avg reward: −10.47. Didn't win a single fight.**
127
+
128
+ ### Late Training: The Divine General Awakens
129
+
130
+ By iteration 5, Mahoraga independently discovered an optimal combat loop:
131
 
132
  ```
133
+ 👁️ OBSERVE → Read the incoming attack type
134
+ 🛡️ ADAPT → Build resistance to exactly that category
135
+ ⚡ STACK → Accumulate adaptation bonuses
136
+ ⚔️ STRIKE → Judgment Strike at peak damage
137
+ 🔄 RESET → Switch stance, never repeat, keep hunting
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
138
  ```
139
 
140
+ It stopped spamming. It stopped healing unnecessarily. It started *reading the fight* and making decisions based on what would maximize damage output per turn.
141
+
142
+ **This wasn't coded. This was learned.**
143
+
144
+ The agent went from a mindless attacker to a calculated predator — adapting, timing, and striking with surgical precision. In just 5 training iterations.
145
+
146
  ---
147
 
148
+ ## 📈 The Glow-Up: From Punching Bag to Final Boss
149
+
150
+ Training: 5 iterations of reward-weighted SFT on Qwen 2.5 3B (LoRA, Kaggle T4 GPU).
151
+
152
+ ### Training Metrics — The Full Picture
153
+
154
+ <div align="center">
155
+ <img src="docs/training_metrics.png" alt="Training Metrics — Reward, Win Rate, Adaptation, Attacks" width="750"/>
156
+ </div>
157
 
158
+ <br>
159
 
160
+ **28-point reward swing.** Win rate 0% → undefeated. Attacks per episode cut in half. Every chart tells the same story: Mahoraga went from clueless to lethal.
 
 
 
161
 
162
+ ### Final Boss Performance (10-Episode Eval)
163
 
164
+ | Episode | Reward | Result | Attacks Used | Adaptation Rate |
165
+ |---------|--------|--------|-------------|-----------------|
166
+ | 1 | 13.48 | Won | 7 | 22.2% |
167
+ | 2 | 19.86 | Won | 4 | 33.3% |
168
+ | 3 | 19.07 | Won | 4 | 42.9% |
169
+ | 4 | 19.07 | ✅ Won | 4 | 42.9% |
170
+ | 5 | 19.86 | ✅ Won | 4 | 33.3% |
171
+ | 6 | 19.77 | ✅ Won | 4 | 33.3% |
172
+ | 7 | 14.88 | ✅ Won | 7 | 12.5% |
173
+ | 8 | 19.86 | ✅ Won | 4 | 33.3% |
174
+ | 9 | 19.86 | ✅ Won | 4 | 33.3% |
175
+ | 10 | 19.77 | ✅ Won | 4 | 33.3% |
176
 
177
+ **10/10 wins. 80% of fights ended in just 4 moves.**
178
 
179
+ Look at episodes 2–6 and 8–10. Same pattern. Same efficiency. Same ruthless execution. Mahoraga found the optimal loop and locked in.
 
 
 
 
 
 
 
 
180
 
181
+ ### The Before & After
182
 
183
+ | | 💀 Untrained | ⚔️ Trained |
184
+ |---|---|---|
185
+ | **Reward** | −10.47 | **+18.55** |
186
+ | **Win Rate** | 0% | **Undefeated** |
187
+ | **Attacks to Win** | ~9 (still lost) | **4** |
188
+ | **Adaptation Accuracy** | 0% | **33–43%** |
189
+ | **Healing Spam** | Constant | **Zero** |
190
+
191
+ That "Attacks to Win" drop is the real story. The untrained model threw 9 attacks and still lost. The trained model needs 4 and it's done. That's not incremental improvement — that's a fundamentally different strategy.
192
 
193
  ---
194
 
195
+ ## 💡 Why Should You Care?
196
+
197
+ Mahoraga isn't just an anime-inspired boss fight (though it absolutely is that too).
198
+
199
+ It's a proof of concept for **adaptive RL agents** — LLMs that change their behavior when the environment changes around them.
200
+
201
+ ### The Real Problem
202
+
203
+ Most LLM agents are one-trick ponies. They find what works and repeat it. That's fine when the world is static.
204
 
205
+ But the world isn't static:
206
+ - **Negotiation opponents** change their strategy mid-conversation
207
+ - **Code environments** evolve yesterday's fix is today's bug
208
+ - **Patients** respond differently to treatment over time
209
+ - **Markets** punish anyone who keeps running the same playbook
210
+
211
+ Mahoraga proves that an LLM can learn to **stop repeating itself** when repetition gets punished. That's a small mechanic with massive implications.
212
+
213
+ ### What We Actually Proved
214
+
215
+ | Claim | Evidence |
216
+ |-------|----------|
217
+ | LLMs can learn adaptive sequential behavior through RL | 0% → Undefeated win rate in 5 iterations |
218
+ | Emergent strategy arises without explicit programming | Agent independently discovered adapt→stack→strike loop |
219
+ | Environment-based rewards outperform static reward models | 7 composable reward functions, each preventing a specific exploit |
220
+ | Resistance mechanics force genuine adaptation | Repeating strategies reduces damage to near-zero |
221
 
222
  ---
223
 
224
+ ## 🏗️ Under the Hood (Quick)
225
 
 
226
  ```
227
+ ┌──────────┐ ┌───────────────┐ ┌──────────────┐
228
+ │ OpenEnv │────▶│ Mahoraga │────▶│ Qwen 2.5 3B
229
+ │ │ │ Environment │ │ LoRA / 4-bit
230
+ └──────────┘ └───────────────┘ └──────────────┘
231
+
232
+ ┌───────────┼───────────┐
233
+ ▼ ▼ ▼
234
+ 7 Reward 3-Phase Resistance
235
+ Functions Curriculum Engine
236
  ```
237
 
238
+ | Stack | Why |
239
+ |-------|-----|
240
+ | **OpenEnv** | Universal RL environment framework (reset / step / state) |
241
+ | **TRL** | Training loop — reward-weighted SFT, GRPO-style |
242
+ | **Unsloth** | 4-bit quantization, fast inference on a single T4 |
243
+ | **Qwen 2.5 3B** | Base LLM — LoRA fine-tuned (r=16, targets q/k/v/o) |
244
+
245
+ **7 reward functions** work together: survival, combat, adaptation, anti-cowardice, efficiency, terminal, and opportunity. Each one exists because the agent found an exploit without it.
246
+
247
+ ---
248
+
249
+ ## 🎮 Face Mahoraga Yourself
250
+
251
+ ### Interactive Dashboard
252
+
253
+ ```bash
254
+ # Terminal 1 — Wake the boss
255
+ python api.py # FastAPI → localhost:8000
256
+
257
+ # Terminal 2 — Enter the arena
258
+ cd frontend && npm install && npm run dev # React → localhost:5173
259
  ```
260
+
261
+ ### Gradio (Lightweight)
262
+
263
+ ```bash
264
+ python app.py # Gradio → localhost:7860
265
  ```
266
 
267
+ ### Train Your Own Mahoraga (Kaggle)
268
 
269
+ ```bash
270
+ # Upload notebooks/meta-mahoraga.ipynb → Kaggle → Enable T4 GPU → Run all
271
+ # Watch it go from punching bag to Divine General in ~30 minutes
272
+ ```
273
 
274
+ ### Quick Test
 
 
 
 
275
 
276
+ ```bash
277
+ git clone https://github.com/Atishay9828/meta_Mahoraga
278
+ cd meta_Mahoraga && pip install -r requirements.txt
279
+ python main.py # Watch a random agent get destroyed
280
+ ```
281
 
282
  ---
283
 
284
+ ## 📂 What's Inside
285
 
286
+ ```
287
+ meta_Mahoraga/
288
+ ├── env/
289
+ │ ├── mahoraga_env.py # The arena — turn-based RL environment
290
+ │ ├── enemy.py # 3-phase curriculum boss AI
291
+ │ ├── mechanics.py # Resistance engine + damage math
292
+ │ ├── rewards.py # 7 composable reward functions
293
+ │ └── gym_wrapper.py # Gymnasium-compatible interface
294
+ ├── notebooks/
295
+ │ └── meta-mahoraga.ipynb # Full training pipeline (Kaggle-ready)
296
+ ├── frontend/ # React tactical dashboard
297
+ ├── api.py # FastAPI + LLM inference server
298
+ ├── app.py # Gradio interactive UI
299
+ └── main.py # CLI arena
300
+ ```
301
 
302
  ---
303
 
304
+ <div align="center">
305
+
306
+ <br>
307
+
308
+ **Meta OpenEnv Hackathon 2026**
309
+
310
  *"The more it is hit, the more it adapts. That is the nature of Mahoraga."*
311
+
312
+ **Team BANGERS** · [Atishay](https://github.com/Atishay9828) & [Mridul](https://github.com/MridulNegi2005)
313
+
314
+ 📓 [Training Notebook](https://www.kaggle.com/code/atishay9828/meta-mahoraga/edit) · 🤗 [Live Demo](https://huggingface.co/spaces/MridulNegi2005/Project-Mahoraga)
315
+
316
+ <br>
317
+
318
+ ⚔️
319
+
320
+ </div>