Converted via llm compressor, with 200 examples from henrik3/sweep-calibration for calibration (which is to be fair not best quality but should be fine)

Performance

Sglang & RTX 2000 Ada (224GB/s bandwidth)

Command:

python3 -m sglang.launch_server --model-path henrik3/sweep-next-edit-v2-7B-AWQ --port 8000 --host 0.0.0.0 --trust-remote-code --mem-fraction-static 0.8 --context-length 16384 --speculative-algorithm NGRAM --speculative-num-draft-tokens 4

Stats

  • ~6900 input tokens
  • ~250 output tokens
  • ~600ms response time

Find the original model here: https://huggingface.co/sweepai/sweep-next-edit-v2-7B

Downloads last month
135
Safetensors
Model size
2B params
Tensor type
BF16
·
I64
·
I32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for henrik3/sweep-next-edit-v2-7B-AWQ

Base model

Qwen/Qwen2.5-7B
Quantized
(4)
this model

Collection including henrik3/sweep-next-edit-v2-7B-AWQ