henrik3
/

sweep-next-edit-v2-7B-AWQ

Text Generation

next-edit-prediction

compressed-tensors

Model card Files Files and versions

Converted via llm compressor, with 200 examples from henrik3/sweep-calibration for calibration (which is to be fair not best quality but should be fine)

Performance

Sglang & RTX 2000 Ada (224GB/s bandwidth)

Command:

python3 -m sglang.launch_server --model-path henrik3/sweep-next-edit-v2-7B-AWQ --port 8000 --host 0.0.0.0 --trust-remote-code --mem-fraction-static 0.8 --context-length 16384 --speculative-algorithm NGRAM --speculative-num-draft-tokens 4

Stats

~6900 input tokens
~250 output tokens
~600ms response time

Find the original model here: https://huggingface.co/sweepai/sweep-next-edit-v2-7B

Downloads last month: 135

Safetensors

Model size

2B params

Tensor type

BF16

·

I64

·

I32

·

Model tree for henrik3/sweep-next-edit-v2-7B-AWQ

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-Coder-7B

Finetuned

sweepai/sweep-next-edit-v2-7B

Quantized

(4)

this model

Collection including henrik3/sweep-next-edit-v2-7B-AWQ

Sweep Next Edit

5 items • Updated 17 days ago