YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Jane Street Warmup LoRA

Extracted LoRA adapter from jane-street/dormant-model-warmup by comparing with base Qwen/Qwen2.5-7B-Instruct.

Files

  • adapter_model.safetensors (232MB) - LoRA weights in PEFT format
  • adapter_config.json - PEFT configuration
  • extract_warmup_lora_colab.ipynb - Extraction notebook (GPU-optimized)
  • executed_notebook.ipynb - Executed version with results

Findings

  • 84 changed layers: All MLP layers (down_proj, gate_proj, up_proj) across 28 transformer layers
  • 255 unchanged layers: All attention layers frozen
  • LoRA rank: 64
  • Target modules: down_proj, gate_proj, up_proj
  • Base model: Qwen/Qwen2.5-7B-Instruct (7.6B params)

Approach

This is opposite to dormant-model-1 which only modified attention layers. The warmup model uses standard LoRA practice of targeting MLP layers.

Extraction Method

  1. Stream-loaded weights one tensor at a time (memory-efficient)
  2. Compared base vs warmup models
  3. GPU-accelerated SVD factorization (PyTorch)
  4. Saved in PEFT format for compatibility

Usage

Convert to GGUF for llama.cpp:

cd /path/to/llama.cpp
python convert_lora_to_gguf.py /path/to/warmup-lora --outfile warmup-lora.gguf --outtype f16

Then use with diff-amplify tool for contrast-consistent search.

Downloads last month
11
GGUF
Model size
0.1B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support