Jane Street Warmup LoRA

Extracted LoRA adapter from jane-street/dormant-model-warmup by comparing with base Qwen/Qwen2.5-7B-Instruct.

Files

84 changed layers: All MLP layers (down_proj, gate_proj, up_proj) across 28 transformer layers
255 unchanged layers: All attention layers frozen
LoRA rank: 64
Target modules: down_proj, gate_proj, up_proj
Base model: Qwen/Qwen2.5-7B-Instruct (7.6B params)

This is opposite to dormant-model-1 which only modified attention layers. The warmup model uses standard LoRA practice of targeting MLP layers.

Convert to GGUF for llama.cpp:

cd /path/to/llama.cpp
python convert_lora_to_gguf.py /path/to/warmup-lora --outfile warmup-lora.gguf --outtype f16

Then use with diff-amplify tool for contrast-consistent search.

GGUF

Model size

0.1B params

Architecture

qwen2

Hardware compatibility

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support