Shrew LoRA Adapters
Work in progress -- adapters are functional but under active development.
LoRA adapters for Qwen/Qwen3.5-2B fine-tuned for structured extraction as part of a production RAG application. These are the models that power Shrew's structured extraction pipeline.
Adapters
| Adapter | Task | LoRA Rank | Status |
|---|---|---|---|
extract_metadata/ |
Extract structured metadata (title, authors, dates, etc.) from document text | r32 / alpha 64 | Production |
summarize_document/ |
Generate document summaries | r32 / alpha 64 | Production |
semantic_chunk/ |
Split documents into semantically coherent sections | r128 / alpha 256 | Beta (50k subset, paused at 57%) |
Usage
Load with PEFT on the base model:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-2B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-2B")
model = PeftModel.from_pretrained(base, "./extract_metadata")
GGUF versions can be used with llama.cpp as LoRA adapters:
llama-cli -m Qwen3.5-2B.gguf --lora semantic_chunk.gguf -p "<prompt>"
For the full pipeline integration, see shrew-server.
vLLM
vLLM loads Qwen3.5 as Qwen3_5ForConditionalGeneration (VLM class), which nests the language model under a language_model. prefix. These adapters are saved in standard PEFT format (trained with AutoModelForCausalLM), so the weight keys must be renamed before serving with vLLM's --enable-lora. Without this, vLLM silently zeros all LoRA weights with no error.
Apply the fix with fix_lora_keys.py from the shrew repo:
python fix_lora_keys.py path/to/adapter
Sampling Parameters
Use Qwen 3.5 instruct-general parameters with enable_thinking=False:
- temperature: 0.7
- top_p: 0.8
- top_k: 20
License
Same as base model (Qwen/Qwen3.5-2B).
- Downloads last month
- 122
We're not able to determine the quantization variants.