CloneCharter
CloneCharter is an encoder-decoder Transformer that takes an audio file and
generates a playable Clone Hero chart (.chart format).
Given a song, it automatically transcribes guitar, bass, or drum notes at any
difficulty level.
Model Architecture
ββββββββββββββββββββββββββββββββββββ
Audio (MP3/OGG/WAV) β ENCODER β
β β β
βΌ β AudioCNNFrontEnd β
Demucs stem β (Conv2D Γ 3, stride-based) β
separation β [B, 512 mels, T] β [B, T/16, d] β
β β + β
βΌ β ConditioningEncoder β
Log-mel β 7 prefix tokens from metadata β
spectrogram β (BPM, TS, instrument, difficulty,β
[512 mels Γ T] β resolution, offset, MERT emb) β
β β + β
MERT embedding βββββββΊβ 8-layer TransformerEncoder β
[768-d] β (pre-norm, bidirectional) β
ββββββββββββββββ¬ββββββββββββββββββββ
β enc_out [B, 512, 768]
ββββββββββββββββΌββββββββββββββββββββ
β DECODER β
β β
β 12-layer autoregressive β
β TransformerDecoder β
β (causal self-attn + cross-attn) β
β + β
β Output projection (weight-tied β
β with token embedding) β
ββββββββββββββββ¬ββββββββββββββββββββ
β
βΌ
Token sequence
(Beat / Pitch / Duration tokens)
β
βΌ
notes.chart file
Key hyperparameters
| Parameter | Value |
|---|---|
d_model |
768 |
| Encoder layers | 8 |
| Decoder layers | 12 |
| Attention heads | 12 |
| FFN dim | 3 072 |
| Vocabulary size | 693 |
| Max encoder length | 512 tokens |
| Max decoder length | 2 048 tokens |
| Mixed precision | bf16 |
Tokenization
The tokenizer (CloneHeroTokenizer) uses a hierarchical beat-based vocabulary:
- Special tokens:
<BOS>,<EOS>,<UNK>,<PAD> - Instrument tokens:
<Guitar>,<Bass>,<Drums> - Difficulty tokens:
<Expert>,<Hard>,<Medium>,<Easy> - Temporal position:
<Minute_N>,<Beat_N>,<Beatshift_N>(sub-beat 1/32 grid) - Pitch:
<Pitch_N>(guitar/bass, 5-fret buttons 0-4) or<DrumsPitch_N> - Duration:
<Beatshift_N>(sustain in 1/32 beat units)
Each note is encoded as a 6-token block:
<Beatshift> <NoteType> <Pitch> <Minute> <Beat> <DurationBeatshift>
Audio Processing
- Stem separation β Demucs v4
isolates guitar/bass/drums tracks from the full mix. - Log-mel spectrogram β 512 mel bands, FFT 4096, hop 1024 @ 44 100 Hz.
The CNN frontend compresses this to 16Γ fewer time steps. - MERT embeddings β MERT-v1-95M
global embedding captures harmonic and rhythmic context.
Intended Use
- Automatic Clone Hero chart generation from any audio file.
- Supported instruments: Lead Guitar, Rhythm Guitar, Bass Guitar, Drums.
- Supported difficulties: Expert, Hard, Medium, Easy.
Limitations
- Performance degrades on heavily distorted or layered mixes.
- BPM estimation may be inaccurate for tracks with variable tempo.
- Trained only on songs with 4/4 time signature.
Citation
@misc{clonecharter2026,
author = {thejorseman},
title = {CloneCharter: Automatic Clone Hero Chart Generation},
year = {2026},
url = {https://huggingface.co/thejorseman/CloneCharter}
}
- Downloads last month
- 26
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support