--- library_name: pytorch license: apache-2.0 pipeline_tag: feature-extraction tags: - tracelock - dream - diffusion-language-model - activation-autoencoder - pytorch --- # TraceLock Dream Activation Autoencoder This repository contains the projection autoencoder checkpoint used to reproduce TraceLock on Dream, as presented in the paper [The Path Matters: Learning a Token-Commitment Policy for Diffusion Language Models](https://huggingface.co/papers/2605.24697). **Code**: [https://github.com/BobSun98/TraceLock](https://github.com/BobSun98/TraceLock) TraceLock is a token-level acceptance policy for Dream-style masked diffusion generation. Dream proposes candidate tokens during the denoising loop, and TraceLock decides which positions should be locked now versus kept masked for later refinement. ## What This Checkpoint Is `best_val_loss.pt` is an activation autoencoder for Dream hidden states. It compresses the last three Dream hidden-state snapshots and two hidden-state deltas into compact features consumed by the TraceLock policy model. This checkpoint is not a text generation model and does not contain Dream model weights. Users still need to download Dream from its original repository: ```text Dream-org/Dream-v0-Instruct-7B ``` ## How It Is Used After downloading this repository into a TraceLock workspace, the expected local path is: ```text $TRACELOCK_HOME/checkpoints/dream-ae-v1/best_val_loss.pt ``` TraceLock uses this checkpoint in two places: 1. `generate_training_traces.sh`: projects Dream activations while building training traces. 2. `train.sh` / evaluation: reconstructs the same projection stack expected by the TraceLock policy. ## Architecture The released checkpoint was trained with: ```json { "d_model": 3584, "d_hidden_bottleneck": 256, "d_delta_bottleneck": 32, "dropout": 0.1 } ``` The exported projection state contains: - hidden-state normalization - delta-state normalization - hidden-state projection encoder - delta-state projection encoder ## Files - `best_val_loss.pt`: projection autoencoder checkpoint. - `config.json`: training/configuration metadata for this autoencoder run. - `data_stats.json`: basic sample count and batch metadata from the run. ## Citation If you use this checkpoint, please cite the following paper: ```bibtex @misc{sun2026pathmatters, title={The Path Matters: Learning a Token-Commitment Policy for Diffusion Language Models}, author={Bohang Sun and Max Zhu and Francesco Caso and Jindong Gu and Junchi Yu and Philip Torr and Pietro Liò and Jialin Yu}, year={2026}, eprint={2605.24697}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2605.24697} } ```