qwopus-commander / README.md
KyleHessling1's picture
Initial deploy: Qwopus Commander game
0adf6a7 verified
metadata
title: Qwopus Commander
emoji: ๐ŸŽฎ
colorFrom: pink
colorTo: indigo
sdk: static
pinned: true
license: apache-2.0
short_description: Neon survival shooter built end-to-end by Qwopus 3.6 27B

Qwopus Commander

A top-down neon survival shooter, written entirely by Jackrong's Qwopus 3.6 27B at Q5_K_M, served locally via llama.cpp. Benchmark and orchestration by Kyle Hessling.

Click play, click again to skip the tutorial, then WASD to move, mouse to aim, click to fire, Shift to dash.


The build, by the numbers

A single local 27B model produced this entire 3,100-line HTML5 game across nine iterative passes โ€” every line of code, every visual, every audio synth, every bug fix went through Qwopus 3.6 27B.

Metric Value
Successful iterations 9
Failed/retried iterations 2 (1 thinking-loop, 1 32K truncation)
Total wall time on the model ~2 h 02 min
Total completion tokens generated 303,537
Average single-stream throughput 41.5 tok/s
Final game size 3,125 lines, 96 KB in one self-contained HTML file
External dependencies 0 โ€” no CDNs, no images, no audio files. Everything procedural.

Every enemy ship is drawn with Canvas primitives. Every sound effect is synthesized live via the Web Audio API. The background drone, the chromatic aberration on hit, the laser beam on the Warden boss, the homing missiles on the Carrier โ€” all of it was produced by the model on first ask or refined across one or two iterations.

How it ran

Setting Value
Inference engine llama.cpp (CUDA 12.8, RTX 5090 Blackwell sm_120a)
Quantization Q5_K_M (18 GB on disk)
Concurrency --parallel 1 (single-stream, single user)
Context window --ctx-size 262144 (full 256 K native)
KV cache --cache-type-k q8_0 --cache-type-v q8_0
Generation temperature 0.85 (with a few attempts at 0.6 โ€” see Lessons learned)
VRAM at load ~30.6 GB / 32 GB on a stock RTX 5090

Each iteration sent the entire current game (often 70โ€“90 KB of code) back as context. Toward the end, prompts were ~30 K input tokens with another ~30 K of fresh output.

What the model figured out unprompted

  • Object pooling for bullets, particles, enemies, power-ups, telegraphs, damage numbers, and floating score text โ€” added on iter 1 without being asked.
  • Web Audio synthesizers for shoot, hit, explosion, dash, hurt, enemy-shoot, and wave-start sounds โ€” seven distinct procedural sound effects, also iter 1.
  • Parallax star field + drifting nebula gradients + procedural neon glow via shadowBlur for the entire visual language.
  • Hitstop, screen shake, chromatic aberration, slow-mo death โ€” game-feel polish that the model added when asked to "make it cinematic".
  • A 77 % accurate self-summary at the end of each iteration explaining what it had just changed.

What the model got wrong (and how it fixed it)

The interesting part of a 27B B's capability profile isn't whether it can write a game โ€” it's how it debugs one across many context-spanning revisions. Every bug below was found by playtesting and fed back as a prompt; every fix was the model's:

Iter Bug Cause Fix
2 Aim coordinates were off by half a screen-width mouseWorldX = mouseX + camera.x - W/2 โ€” extra -W/2 because camera.x was already top-left, not center Removed the extra subtraction
5 "Shooting stopped" mid-game Pool.get() orphaned new objects when the pool drained; bulletPool.active = filter(...) never returned dead objects to the pool Added prune(predicate) method that properly releases; migrated all 7 pools
6 Wave 5 (boss wave) never progressed bossPool was the only pool I forgot to mention in the iter-5 migration; killed bosses stayed in bossPool.active, so length === 0 never fired Added bossPool.prune(b => b.alive)
7 Player became permanently invincible after the first dash DASH_INVULN (0.18 s) is longer than DASH_DURATION (0.12 s), but the model decremented dashInvuln inside the if (dashTimer > 0) branch. Once dash motion ended, invuln froze at ~0.06 s positive forever. Moved the dashInvuln decrement out of the guard, with its own if (... > 0) check
8 Power-up status chips were drawn off-screen on the right puX = hpBarX + hpBarW + 8 was anchored past an already-right-flush HP bar Stacked the chips vertically below the HP bar instead

Lessons learned (about driving a small-ish local model on a long codegen task)

  • Temperature 0.6 + thinking-on = thinking loop. First retry of iter 2 burned 24,000 tokens of internal reasoning and produced zero visible output. Bumping to 0.7โ€“0.85 fixes it cleanly. Future runs default to 0.8.
  • max_tokens matters more than people think. Iter 8 ran out of budget mid-</script> at 32 K. Bumping to 40 K let the model finish a clean closing tag and a post-code summary.
  • Single-stream inference at Q8 KV uses VRAM extremely well. At parallel=1, ctx=256 K, q8 KV โ€” the model fit in 30.6 GB with 1.5 GB of headroom on a stock 32 GB RTX 5090. No model offload, no swapping, no MTP needed.
  • The model's per-iteration self-summary is roughly trustworthy โ€” but always verify with grep. It claimed in iter 4 that "enemies now spawn near the player," which was true; it also kept the old spawnEnemyAtEdge function as dead code, which I had to check by reading.

Watch the run

Source HTML lives in this Space โ€” index.html is the entire game, no build step, no bundler, no dependencies.

If you want to reproduce: pull Jackrong/Qwopus3.5-27B-v2-GGUF Q5_K_M, run with llama-server --ctx-size 262144 --parallel 1 --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn on --n-gpu-layers 999, and iterate.

๐ŸŽฎ Click play to start.