YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Jane Street Warmup LoRA
Extracted LoRA adapter from jane-street/dormant-model-warmup by comparing with base Qwen/Qwen2.5-7B-Instruct.
Files
adapter_model.safetensors(232MB) - LoRA weights in PEFT formatadapter_config.json- PEFT configurationextract_warmup_lora_colab.ipynb- Extraction notebook (GPU-optimized)executed_notebook.ipynb- Executed version with results
Findings
- 84 changed layers: All MLP layers (down_proj, gate_proj, up_proj) across 28 transformer layers
- 255 unchanged layers: All attention layers frozen
- LoRA rank: 64
- Target modules:
down_proj,gate_proj,up_proj - Base model: Qwen/Qwen2.5-7B-Instruct (7.6B params)
Approach
This is opposite to dormant-model-1 which only modified attention layers. The warmup model uses standard LoRA practice of targeting MLP layers.
Extraction Method
- Stream-loaded weights one tensor at a time (memory-efficient)
- Compared base vs warmup models
- GPU-accelerated SVD factorization (PyTorch)
- Saved in PEFT format for compatibility
Usage
Convert to GGUF for llama.cpp:
cd /path/to/llama.cpp
python convert_lora_to_gguf.py /path/to/warmup-lora --outfile warmup-lora.gguf --outtype f16
Then use with diff-amplify tool for contrast-consistent search.
- Downloads last month
- 11
Hardware compatibility
Log In to add your hardware
16-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support