Spaces:
Running
Running
File size: 7,853 Bytes
31d3580 f2a1626 95e3d2a 31d3580 91806f8 31d3580 95e3d2a 31d3580 f2a1626 31d3580 85d9997 f2a1626 31d3580 95e3d2a c470fa4 95e3d2a c470fa4 95e3d2a 6b77a69 95e3d2a 637582a 95e3d2a 6ecbce1 6b77a69 6ecbce1 95e3d2a 31d3580 95e3d2a 31d3580 95e3d2a 31d3580 6d55c38 95e3d2a f2a1626 95e3d2a f2a1626 95e3d2a c470fa4 95e3d2a 31d3580 95e3d2a 31d3580 95e3d2a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 | ---
title: OpenCS2 Dataset - Viewer
emoji: π―
colorFrom: yellow
colorTo: gray
sdk: docker
app_port: 7860
pinned: false
license: mit
short_description: Browse the CS2 dataset by match, map, round, and POV.
---
# OpenCS2 Dataset - Viewer

Browser for [`blanchon/cs2_dataset_render`](https://huggingface.co/datasets/blanchon/cs2_dataset_render), the OpenCS2 dataset: a rendered Counter Strike 2 dataset built from professional HLTV demos. The viewer lists every match, map, and round in the dataset and plays the 10 synchronized player POVs back-to-back on a single timeline β without downloading the full archive.
It's a pure-frontend SvelteKit app: parquet indexes are read in the browser via [`hyparquet`](https://github.com/hyparquet/hyparquet), preview MP4s stream from Hugging Face, and 10 chunked players stay in sync through a custom [`mediabunny`](https://mediabunny.dev/) pipeline.
## Links
- **Dataset:** <https://huggingface.co/datasets/blanchon/cs2_dataset_render>
- **Live viewer (HF Space):** <https://huggingface.co/spaces/blanchon/opencs2-dataset-viewer>
- **Source dataset (raw demos):** <https://huggingface.co/datasets/blanchon/cs2_dataset_demo>
- **GitHub:** <https://github.com/julien-blanchon/opencs2-dataset>
- **Author:** [Julien Blanchon](https://guybrush.ink/) β [@JulienBlanchon](https://x.com/JulienBlanchon)
## Motivation
Counter Strike 2 is an interesting environment for sequential decision-making: long horizons, partial observability, dense visual signal, spatialized audio, 5 vs 5 multi-agent dynamics, and a competitive equilibrium that is genuinely hard. Pro HLTV demos give us tens of thousands of hours of expert play, but until now they've lived in a binary `.dem` format that is not directly trainable.
This dataset turns those demos into rendered, frame-accurate, fully-annotated training data. A few of the things it makes tractable:
- **Behaviour cloning / VLA policies.** `(frame, audio) β action` for vision-conditioned action models.
- **Inverse Dynamics Models (IDM).** `(frame_t, frame_{t+k}) β action` to recover actions from unlabelled video β the workhorse for VPT-style pre-training.
- **Forward dynamics / world models.** `(frame, action) β frame_{t+1}` with all 10 player POVs of the same world state available as supervision.
- **Spatial-audio conditioning.** Per-player stereo recorded relative to each agent's position and orientation, so models can learn to localize footsteps, gunfire, and callouts.
- **Multi-agent training.** All 10 perspectives of the same round are kept aligned tick-for-tick β useful for collaborative policies, opponent modelling, and multi-player world models.
- **Multi-view SfM / depth benchmark.** Ground-truth camera intrinsics and metric depth maps from 10 synchronized viewpoints make this a clean reference for structure-from-motion, multi-view stereo, and camera pose estimation.
## What's recorded
For every chunk (β€ 1 minute, one player POV):
- **Video** β 1280Γ720 @ 32 fps, near-lossless H.264. One stream per player; ten POVs per round all tick-aligned.
- **Audio** β per-player stereo, mixed from each agent's position and orientation (footsteps, gunfire, callouts).
- **Inputs** β every tick: keyboard state, mouse delta, fire/jump/use, weapon switches.
- **World state** β every tick, for all 10 players: position, velocity, view yaw/pitch, camera intrinsics, health, armor, ammo, primary/secondary weapon, alive flag.
- **G-buffers (coming soon)** β per-pixel luminance, depth map, and vertex-ID map for self-supervised pretraining and dense prediction heads.
- **Tick-perfect alignment** β every signal is sampled on the same CS2 tick clock. No drift, no resampling artifacts, no per-stream timestamp reconciliation.
## Dataset
`blanchon/cs2_dataset_render`, derived from `blanchon/cs2_dataset_demo`. Licensed under `CC-BY-4.0`.
Each row is a one-minute-or-shorter chunk of a single player's POV. Four configs are exposed:
| Config | Row | Use |
| -------------------- | ------------------------------------------------------------------------------------- | ------------------------------------- |
| `previews` (default) | One low-res `preview.mp4` per chunk + 1 Hz inputs/world sidecars | Cheap browsing, viewer, sanity checks |
| `chunks` | Path-only references to `video.mp4` + `audio.wav`, with embedded inputs/world streams | Full-resolution training |
| `matches` | One row per `(match_id, map_name)` with team/event/date metadata | Index / filtering |
| `rounds` | One row per `(match_id, map_name, round)` with tick boundaries | Index / filtering |
### Filesystem layout
```text
index/
manifest-<machine>-<uuid>.parquet # matches index
rounds-<machine>-<uuid>.parquet # rounds index
data/
match_id=<id>/map_name=<map>/player=<0-9>/
chunks-preview-<machine>-<uuid>.parquet
chunks-full-<machine>-<uuid>.parquet
chunks/chunk_<n>/{video.mp4,audio.wav}
previews/chunk_<n>/{preview.mp4,inputs.preview.json,world.preview.jsonl}
```
Hive-style `key=value` partitioning lets you prune at the path level. Recording starts at `freeze_end_tick` and stops at the player's death tick (or round end for survivors), so player streams have different durations within the same round.
### Quick query
```python
from datasets import load_dataset
# Default lightweight preview rows
previews = load_dataset("blanchon/cs2_dataset_render", split="train", streaming=True)
# Full training rows, columnar load
chunks = load_dataset(
"blanchon/cs2_dataset_render", "chunks",
split="train", streaming=True,
columns=["video", "audio", "inputs", "worlds", "match_id", "round", "player"],
filters=[("player", "==", 0)],
)
```
```sql
-- Match index via DuckDB
SELECT match_id, map_name, team1, team2, event, match_date
FROM 'hf://datasets/blanchon/cs2_dataset_render/index/manifest-*.parquet'
LIMIT 20;
```
## Develop
```sh
bun install
bun run dev # vite dev server
bun run build # static build β dist/
bun run preview # serve the build
bun run check # svelte-check
```
The repo ships a `Dockerfile` and `serve.ts` that mirror the Hugging Face Space deployment.
### Build flags
- `PUBLIC_DISABLE_EVAL=1` β hide the evaluation surface (header toggle, eval bar, flag dialog) for public deploys. Set at build time only (`PUBLIC_DISABLE_EVAL=1 bun run build`); the eval module stays in the bundle but is never rendered.
## Viewer internals
- **`hyparquet` + `hyparquet-compressors`** read the match and round parquet shards directly from `hf://` URLs. No server, no DuckDB, no WASM bundle larger than necessary.
- **`mediabunny`** powers the 10 chunked POV players. One round's worth of `preview.mp4` chunks per player is concatenated into a single virtual timeline; players who died early have shorter timelines and gracefully fall out.
- **`world.preview.jsonl`** drives the minimap (player positions, team, alive/dead, weapon) and the per-tick state overlay.
- **`inputs.preview.json`** drives the input-overlay HUD (movement keys, mouse, fire/jump).
The whole viewer is a static SvelteKit build β there is no backend.
## Citation
```bibtex
@misc{blanchon2026opencs2,
author = {Julien Blanchon},
title = {OpenCS2 Dataset},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://github.com/julien-blanchon/opencs2-dataset}},
}
```
## License
Code: MIT. Dataset: CC-BY-4.0.
|