File size: 7,853 Bytes
31d3580
f2a1626
95e3d2a
 
31d3580
91806f8
 
31d3580
 
95e3d2a
31d3580
 
f2a1626
31d3580
85d9997
 
f2a1626
31d3580
95e3d2a
 
 
 
 
c470fa4
95e3d2a
c470fa4
95e3d2a
 
 
 
 
 
 
 
 
 
 
6b77a69
95e3d2a
637582a
95e3d2a
6ecbce1
 
 
 
 
6b77a69
6ecbce1
 
 
 
 
95e3d2a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31d3580
 
 
 
95e3d2a
 
 
 
 
31d3580
 
95e3d2a
31d3580
6d55c38
 
 
 
95e3d2a
 
 
 
 
 
 
 
 
 
 
 
f2a1626
95e3d2a
f2a1626
95e3d2a
 
c470fa4
95e3d2a
31d3580
 
95e3d2a
31d3580
95e3d2a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
title: OpenCS2 Dataset - Viewer
emoji: 🎯
colorFrom: yellow
colorTo: gray
sdk: docker
app_port: 7860
pinned: false
license: mit
short_description: Browse the CS2 dataset by match, map, round, and POV.
---

# OpenCS2 Dataset - Viewer

![OpenCS2 Dataset β€” 10 tick-aligned player POVs, inputs, world state, and audio, built from professional HLTV demos.](static/header.webp)

Browser for [`blanchon/cs2_dataset_render`](https://huggingface.co/datasets/blanchon/cs2_dataset_render), the OpenCS2 dataset: a rendered Counter Strike 2 dataset built from professional HLTV demos. The viewer lists every match, map, and round in the dataset and plays the 10 synchronized player POVs back-to-back on a single timeline β€” without downloading the full archive.

It's a pure-frontend SvelteKit app: parquet indexes are read in the browser via [`hyparquet`](https://github.com/hyparquet/hyparquet), preview MP4s stream from Hugging Face, and 10 chunked players stay in sync through a custom [`mediabunny`](https://mediabunny.dev/) pipeline.

## Links

- **Dataset:** <https://huggingface.co/datasets/blanchon/cs2_dataset_render>
- **Live viewer (HF Space):** <https://huggingface.co/spaces/blanchon/opencs2-dataset-viewer>
- **Source dataset (raw demos):** <https://huggingface.co/datasets/blanchon/cs2_dataset_demo>
- **GitHub:** <https://github.com/julien-blanchon/opencs2-dataset>
- **Author:** [Julien Blanchon](https://guybrush.ink/) β€” [@JulienBlanchon](https://x.com/JulienBlanchon)

## Motivation

Counter Strike 2 is an interesting environment for sequential decision-making: long horizons, partial observability, dense visual signal, spatialized audio, 5 vs 5 multi-agent dynamics, and a competitive equilibrium that is genuinely hard. Pro HLTV demos give us tens of thousands of hours of expert play, but until now they've lived in a binary `.dem` format that is not directly trainable.

This dataset turns those demos into rendered, frame-accurate, fully-annotated training data. A few of the things it makes tractable:

- **Behaviour cloning / VLA policies.** `(frame, audio) β†’ action` for vision-conditioned action models.
- **Inverse Dynamics Models (IDM).** `(frame_t, frame_{t+k}) β†’ action` to recover actions from unlabelled video β€” the workhorse for VPT-style pre-training.
- **Forward dynamics / world models.** `(frame, action) β†’ frame_{t+1}` with all 10 player POVs of the same world state available as supervision.
- **Spatial-audio conditioning.** Per-player stereo recorded relative to each agent's position and orientation, so models can learn to localize footsteps, gunfire, and callouts.
- **Multi-agent training.** All 10 perspectives of the same round are kept aligned tick-for-tick β€” useful for collaborative policies, opponent modelling, and multi-player world models.
- **Multi-view SfM / depth benchmark.** Ground-truth camera intrinsics and metric depth maps from 10 synchronized viewpoints make this a clean reference for structure-from-motion, multi-view stereo, and camera pose estimation.

## What's recorded

For every chunk (≀ 1 minute, one player POV):

- **Video** β€” 1280Γ—720 @ 32 fps, near-lossless H.264. One stream per player; ten POVs per round all tick-aligned.
- **Audio** β€” per-player stereo, mixed from each agent's position and orientation (footsteps, gunfire, callouts).
- **Inputs** β€” every tick: keyboard state, mouse delta, fire/jump/use, weapon switches.
- **World state** β€” every tick, for all 10 players: position, velocity, view yaw/pitch, camera intrinsics, health, armor, ammo, primary/secondary weapon, alive flag.
- **G-buffers (coming soon)** β€” per-pixel luminance, depth map, and vertex-ID map for self-supervised pretraining and dense prediction heads.
- **Tick-perfect alignment** β€” every signal is sampled on the same CS2 tick clock. No drift, no resampling artifacts, no per-stream timestamp reconciliation.

## Dataset

`blanchon/cs2_dataset_render`, derived from `blanchon/cs2_dataset_demo`. Licensed under `CC-BY-4.0`.

Each row is a one-minute-or-shorter chunk of a single player's POV. Four configs are exposed:

| Config               | Row                                                                                   | Use                                   |
| -------------------- | ------------------------------------------------------------------------------------- | ------------------------------------- |
| `previews` (default) | One low-res `preview.mp4` per chunk + 1 Hz inputs/world sidecars                      | Cheap browsing, viewer, sanity checks |
| `chunks`             | Path-only references to `video.mp4` + `audio.wav`, with embedded inputs/world streams | Full-resolution training              |
| `matches`            | One row per `(match_id, map_name)` with team/event/date metadata                      | Index / filtering                     |
| `rounds`             | One row per `(match_id, map_name, round)` with tick boundaries                        | Index / filtering                     |

### Filesystem layout

```text
index/
  manifest-<machine>-<uuid>.parquet     # matches index
  rounds-<machine>-<uuid>.parquet       # rounds index
data/
  match_id=<id>/map_name=<map>/player=<0-9>/
    chunks-preview-<machine>-<uuid>.parquet
    chunks-full-<machine>-<uuid>.parquet
    chunks/chunk_<n>/{video.mp4,audio.wav}
    previews/chunk_<n>/{preview.mp4,inputs.preview.json,world.preview.jsonl}
```

Hive-style `key=value` partitioning lets you prune at the path level. Recording starts at `freeze_end_tick` and stops at the player's death tick (or round end for survivors), so player streams have different durations within the same round.

### Quick query

```python
from datasets import load_dataset

# Default lightweight preview rows
previews = load_dataset("blanchon/cs2_dataset_render", split="train", streaming=True)

# Full training rows, columnar load
chunks = load_dataset(
    "blanchon/cs2_dataset_render", "chunks",
    split="train", streaming=True,
    columns=["video", "audio", "inputs", "worlds", "match_id", "round", "player"],
    filters=[("player", "==", 0)],
)
```

```sql
-- Match index via DuckDB
SELECT match_id, map_name, team1, team2, event, match_date
FROM 'hf://datasets/blanchon/cs2_dataset_render/index/manifest-*.parquet'
LIMIT 20;
```

## Develop

```sh
bun install
bun run dev      # vite dev server
bun run build    # static build β†’ dist/
bun run preview  # serve the build
bun run check    # svelte-check
```

The repo ships a `Dockerfile` and `serve.ts` that mirror the Hugging Face Space deployment.

### Build flags

- `PUBLIC_DISABLE_EVAL=1` β€” hide the evaluation surface (header toggle, eval bar, flag dialog) for public deploys. Set at build time only (`PUBLIC_DISABLE_EVAL=1 bun run build`); the eval module stays in the bundle but is never rendered.

## Viewer internals

- **`hyparquet` + `hyparquet-compressors`** read the match and round parquet shards directly from `hf://` URLs. No server, no DuckDB, no WASM bundle larger than necessary.
- **`mediabunny`** powers the 10 chunked POV players. One round's worth of `preview.mp4` chunks per player is concatenated into a single virtual timeline; players who died early have shorter timelines and gracefully fall out.
- **`world.preview.jsonl`** drives the minimap (player positions, team, alive/dead, weapon) and the per-tick state overlay.
- **`inputs.preview.json`** drives the input-overlay HUD (movement keys, mouse, fire/jump).

The whole viewer is a static SvelteKit build β€” there is no backend.

## Citation

```bibtex
@misc{blanchon2026opencs2,
  author       = {Julien Blanchon},
  title        = {OpenCS2 Dataset},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://github.com/julien-blanchon/opencs2-dataset}},
}
```

## License

Code: MIT. Dataset: CC-BY-4.0.