title: Qwopus Commander
emoji: ๐ฎ
colorFrom: pink
colorTo: indigo
sdk: static
pinned: true
license: apache-2.0
short_description: Neon survival shooter built end-to-end by Qwopus 3.6 27B
Qwopus Commander
A top-down neon survival shooter, written entirely by Jackrong's Qwopus 3.6 27B at Q5_K_M, served locally via llama.cpp. Benchmark and orchestration by Kyle Hessling.
Click play, click again to skip the tutorial, then WASD to move, mouse to aim, click to fire, Shift to dash.
The build, by the numbers
A single local 27B model produced this entire 3,100-line HTML5 game across nine iterative passes โ every line of code, every visual, every audio synth, every bug fix went through Qwopus 3.6 27B.
| Metric | Value |
|---|---|
| Successful iterations | 9 |
| Failed/retried iterations | 2 (1 thinking-loop, 1 32K truncation) |
| Total wall time on the model | ~2 h 02 min |
| Total completion tokens generated | 303,537 |
| Average single-stream throughput | 41.5 tok/s |
| Final game size | 3,125 lines, 96 KB in one self-contained HTML file |
| External dependencies | 0 โ no CDNs, no images, no audio files. Everything procedural. |
Every enemy ship is drawn with Canvas primitives. Every sound effect is synthesized live via the Web Audio API. The background drone, the chromatic aberration on hit, the laser beam on the Warden boss, the homing missiles on the Carrier โ all of it was produced by the model on first ask or refined across one or two iterations.
How it ran
| Setting | Value |
|---|---|
| Inference engine | llama.cpp (CUDA 12.8, RTX 5090 Blackwell sm_120a) |
| Quantization | Q5_K_M (18 GB on disk) |
| Concurrency | --parallel 1 (single-stream, single user) |
| Context window | --ctx-size 262144 (full 256 K native) |
| KV cache | --cache-type-k q8_0 --cache-type-v q8_0 |
| Generation temperature | 0.85 (with a few attempts at 0.6 โ see Lessons learned) |
| VRAM at load | ~30.6 GB / 32 GB on a stock RTX 5090 |
Each iteration sent the entire current game (often 70โ90 KB of code) back as context. Toward the end, prompts were ~30 K input tokens with another ~30 K of fresh output.
What the model figured out unprompted
- Object pooling for bullets, particles, enemies, power-ups, telegraphs, damage numbers, and floating score text โ added on iter 1 without being asked.
- Web Audio synthesizers for shoot, hit, explosion, dash, hurt, enemy-shoot, and wave-start sounds โ seven distinct procedural sound effects, also iter 1.
- Parallax star field + drifting nebula gradients + procedural neon glow via
shadowBlurfor the entire visual language. - Hitstop, screen shake, chromatic aberration, slow-mo death โ game-feel polish that the model added when asked to "make it cinematic".
- A 77 % accurate self-summary at the end of each iteration explaining what it had just changed.
What the model got wrong (and how it fixed it)
The interesting part of a 27B B's capability profile isn't whether it can write a game โ it's how it debugs one across many context-spanning revisions. Every bug below was found by playtesting and fed back as a prompt; every fix was the model's:
| Iter | Bug | Cause | Fix |
|---|---|---|---|
| 2 | Aim coordinates were off by half a screen-width | mouseWorldX = mouseX + camera.x - W/2 โ extra -W/2 because camera.x was already top-left, not center |
Removed the extra subtraction |
| 5 | "Shooting stopped" mid-game | Pool.get() orphaned new objects when the pool drained; bulletPool.active = filter(...) never returned dead objects to the pool |
Added prune(predicate) method that properly releases; migrated all 7 pools |
| 6 | Wave 5 (boss wave) never progressed | bossPool was the only pool I forgot to mention in the iter-5 migration; killed bosses stayed in bossPool.active, so length === 0 never fired |
Added bossPool.prune(b => b.alive) |
| 7 | Player became permanently invincible after the first dash | DASH_INVULN (0.18 s) is longer than DASH_DURATION (0.12 s), but the model decremented dashInvuln inside the if (dashTimer > 0) branch. Once dash motion ended, invuln froze at ~0.06 s positive forever. |
Moved the dashInvuln decrement out of the guard, with its own if (... > 0) check |
| 8 | Power-up status chips were drawn off-screen on the right | puX = hpBarX + hpBarW + 8 was anchored past an already-right-flush HP bar |
Stacked the chips vertically below the HP bar instead |
Lessons learned (about driving a small-ish local model on a long codegen task)
- Temperature 0.6 + thinking-on = thinking loop. First retry of iter 2 burned 24,000 tokens of internal reasoning and produced zero visible output. Bumping to 0.7โ0.85 fixes it cleanly. Future runs default to 0.8.
max_tokensmatters more than people think. Iter 8 ran out of budget mid-</script>at 32 K. Bumping to 40 K let the model finish a clean closing tag and a post-code summary.- Single-stream inference at Q8 KV uses VRAM extremely well. At parallel=1, ctx=256 K, q8 KV โ the model fit in 30.6 GB with 1.5 GB of headroom on a stock 32 GB RTX 5090. No model offload, no swapping, no MTP needed.
- The model's per-iteration self-summary is roughly trustworthy โ but always verify with
grep. It claimed in iter 4 that "enemies now spawn near the player," which was true; it also kept the oldspawnEnemyAtEdgefunction as dead code, which I had to check by reading.
Watch the run
Source HTML lives in this Space โ index.html is the entire game, no build step, no bundler, no dependencies.
If you want to reproduce: pull Jackrong/Qwopus3.5-27B-v2-GGUF Q5_K_M, run with llama-server --ctx-size 262144 --parallel 1 --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn on --n-gpu-layers 999, and iterate.
๐ฎ Click play to start.