Spaces:

KyleHessling1
/

qwopus-commander

Running

App Files Files Community

qwopus-commander / README.md

KyleHessling1

Initial deploy: Qwopus Commander game

0adf6a7 verified 2 days ago

preview code

raw

history blame contribute delete

6.1 kB

metadata

title: Qwopus Commander
emoji: 🎮
colorFrom: pink
colorTo: indigo
sdk: static
pinned: true
license: apache-2.0
short_description: Neon survival shooter built end-to-end by Qwopus 3.6 27B

Qwopus Commander

A top-down neon survival shooter, written entirely by Jackrong's Qwopus 3.6 27B at Q5_K_M, served locally via llama.cpp. Benchmark and orchestration by Kyle Hessling.

Click play, click again to skip the tutorial, then WASD to move, mouse to aim, click to fire, Shift to dash.

The build, by the numbers

A single local 27B model produced this entire 3,100-line HTML5 game across nine iterative passes — every line of code, every visual, every audio synth, every bug fix went through Qwopus 3.6 27B.

Metric	Value
Successful iterations	9
Failed/retried iterations	2 (1 thinking-loop, 1 32K truncation)
Total wall time on the model	~2 h 02 min
Total completion tokens generated	303,537
Average single-stream throughput	41.5 tok/s
Final game size	3,125 lines, 96 KB in one self-contained HTML file
External dependencies	0 — no CDNs, no images, no audio files. Everything procedural.

Every enemy ship is drawn with Canvas primitives. Every sound effect is synthesized live via the Web Audio API. The background drone, the chromatic aberration on hit, the laser beam on the Warden boss, the homing missiles on the Carrier — all of it was produced by the model on first ask or refined across one or two iterations.

How it ran

Setting	Value
Inference engine	`llama.cpp` (CUDA 12.8, RTX 5090 Blackwell `sm_120a`)
Quantization	Q5_K_M (18 GB on disk)
Concurrency	`--parallel 1` (single-stream, single user)
Context window	`--ctx-size 262144` (full 256 K native)
KV cache	`--cache-type-k q8_0 --cache-type-v q8_0`
Generation temperature	0.85 (with a few attempts at 0.6 — see Lessons learned)
VRAM at load	~30.6 GB / 32 GB on a stock RTX 5090

Each iteration sent the entire current game (often 70–90 KB of code) back as context. Toward the end, prompts were ~30 K input tokens with another ~30 K of fresh output.

What the model figured out unprompted

Object pooling for bullets, particles, enemies, power-ups, telegraphs, damage numbers, and floating score text — added on iter 1 without being asked.
Web Audio synthesizers for shoot, hit, explosion, dash, hurt, enemy-shoot, and wave-start sounds — seven distinct procedural sound effects, also iter 1.
Parallax star field + drifting nebula gradients + procedural neon glow via shadowBlur for the entire visual language.
Hitstop, screen shake, chromatic aberration, slow-mo death — game-feel polish that the model added when asked to "make it cinematic".
A 77 % accurate self-summary at the end of each iteration explaining what it had just changed.

What the model got wrong (and how it fixed it)

The interesting part of a 27B B's capability profile isn't whether it can write a game — it's how it debugs one across many context-spanning revisions. Every bug below was found by playtesting and fed back as a prompt; every fix was the model's:

Iter	Bug	Cause	Fix
2	Aim coordinates were off by half a screen-width	`mouseWorldX = mouseX + camera.x - W/2` — extra `-W/2` because `camera.x` was already top-left, not center	Removed the extra subtraction
5	"Shooting stopped" mid-game	`Pool.get()` orphaned new objects when the pool drained; `bulletPool.active = filter(...)` never returned dead objects to the pool	Added `prune(predicate)` method that properly releases; migrated all 7 pools
6	Wave 5 (boss wave) never progressed	`bossPool` was the only pool I forgot to mention in the iter-5 migration; killed bosses stayed in `bossPool.active`, so `length === 0` never fired	Added `bossPool.prune(b => b.alive)`
7	Player became permanently invincible after the first dash	`DASH_INVULN` (0.18 s) is longer than `DASH_DURATION` (0.12 s), but the model decremented `dashInvuln` inside the `if (dashTimer > 0)` branch. Once dash motion ended, invuln froze at ~0.06 s positive forever.	Moved the `dashInvuln` decrement out of the guard, with its own `if (... > 0)` check
8	Power-up status chips were drawn off-screen on the right	`puX = hpBarX + hpBarW + 8` was anchored past an already-right-flush HP bar	Stacked the chips vertically below the HP bar instead

Lessons learned (about driving a small-ish local model on a long codegen task)

Temperature 0.6 + thinking-on = thinking loop. First retry of iter 2 burned 24,000 tokens of internal reasoning and produced zero visible output. Bumping to 0.7–0.85 fixes it cleanly. Future runs default to 0.8.
max_tokens matters more than people think. Iter 8 ran out of budget mid-</script> at 32 K. Bumping to 40 K let the model finish a clean closing tag and a post-code summary.
Single-stream inference at Q8 KV uses VRAM extremely well. At parallel=1, ctx=256 K, q8 KV — the model fit in 30.6 GB with 1.5 GB of headroom on a stock 32 GB RTX 5090. No model offload, no swapping, no MTP needed.
The model's per-iteration self-summary is roughly trustworthy — but always verify with grep. It claimed in iter 4 that "enemies now spawn near the player," which was true; it also kept the old spawnEnemyAtEdge function as dead code, which I had to check by reading.

Watch the run

Source HTML lives in this Space — index.html is the entire game, no build step, no bundler, no dependencies.

If you want to reproduce: pull Jackrong/Qwopus3.5-27B-v2-GGUF Q5_K_M, run with llama-server --ctx-size 262144 --parallel 1 --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn on --n-gpu-layers 999, and iterate.

🎮 Click play to start.