GPTQ μ–‘μžν™”λŠ” V100κ³Ό 같은 compute capability κ°€ 7.0 인 μƒν™©μ—μ„œ vllm을 돌릴 수 μžˆλŠ” λͺ‡ μ•ˆλ˜λŠ” μ–‘μžν™” λ°©μ‹μž…λ‹ˆλ‹€.

정확도가 AWQλ‚˜ NVFP4 GGUF 에 λΉ„ν•΄μ„œλŠ” λ–¨μ–΄μ§ˆ 수 μžˆκ² μ§€λ§Œ μœ μš©ν•˜κ²Œ μ‚¬μš©ν•˜μ…¨μœΌλ©΄ ν•©λ‹ˆλ‹€.

ν˜Ήμ‚¬λ‚˜ μ–‘μžν™” ν–ˆμœΌλ©΄ λͺ¨λΈμ΄ μžˆλ‹€λ©΄ https://github.com/LEE5J/llm-quant-lab issue λ‚¨κ²¨μ£Όμ‹œλ©΄ 확인후 μ–‘μžν™” ν•˜λ„λ‘ ν•˜κ² μŠ΅λ‹ˆλ‹€.

Downloads last month
35
Safetensors
Model size
34B params
Tensor type
I64
Β·
I32
Β·
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for lee5j/EXAONE-4.5-33B_GPTQ8

Quantized
(5)
this model