34 21

宋小猫

SongXiaoMao

AI & ML interests

None yet

Recent Activity

new activity 3 days ago

z-lab/Qwen3.5-27B-DFlash:FP8 work for base model or is 16-bit of 27B required?

liked a model 4 days ago

Qwen/Qwen3.5-27B-FP8

liked a model 4 days ago

Qwen/Qwen3.5-122B-A10B-GPTQ-Int4

View all activity

Organizations

None yet

New activity in z-lab/Qwen3.5-27B-DFlash 3 days ago

FP8 work for base model or is 16-bit of 27B required?

#2 opened 17 days ago by

unoid

New activity in Jackrong/Qwopus3.5-27B-v3 12 days ago

Is there anyone who can tell me how to run this model with vllm correctly?

😔 3

#8 opened 14 days ago by

beginor

New activity in olka-fi/Qwen3.5-27B-MXFP4 15 days ago

Can the big guy quantify this model into MXFP4? Thank you!!

#3 opened 15 days ago by

SongXiaoMao

New activity in Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled 15 days ago

How does the VLLM start this model?

#4 opened about 1 month ago by

SongXiaoMao

New activity in olka-fi/Qwen3.5-122B-A10B-MXFP4 17 days ago

This quantization model is amzing

❤️👍 2

#1 opened about 1 month ago by

hyunw55

New activity in olka-fi/Qwen3.5-27B-MXFP4 18 days ago

Why is the file size of 4bit similar to FP8?

#2 opened 18 days ago by

SongXiaoMao

New activity in edp1096/Huihui-Qwen3.5-27B-abliterated-FP8 18 days ago

Sensitive information is not a question

#3 opened 20 days ago by

SongXiaoMao

New activity in win10/Huihui-Qwen3.5-27B-abliterated-FP8 20 days ago

VLLM 0.18.0 runs with an error

#2 opened 20 days ago by

SongXiaoMao

New activity in groxaxo/Huihui-Qwen3.5-27B-W8A8-INT8 23 days ago

I get an error using vllm0.18.0

#1 opened 23 days ago by

SongXiaoMao

New activity in huihui-ai/Huihui-Qwen3.5-27B-Claude-4.6-Opus-abliterated 28 days ago

使用VLLM启动会报错

#3 opened 28 days ago by

SongXiaoMao

New activity in Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled about 1 month ago

Tokenizer class TokenizersBackend does not exist in vllm v0.17.1

#26 opened about 1 month ago by

putcn

New activity in huihui-ai/Huihui-Qwen3.5-122B-A10B-abliterated-GGUF about 1 month ago

Can you make a quantitative model? Qwen3.5-122B-A10B-GPTQ-Int4

#2 opened about 1 month ago by

SongXiaoMao

New activity in Qwen/QwQ-32B about 1 year ago

When will you fix the model replies missing</think>\n start tags

#19 opened about 1 year ago by

xldistance

When answering questions in Chinese, the model frequently terminates prematurely (outputs the end token). Is this a common problem?

#40 opened about 1 year ago by

zhangw355

missing opening <think>

#4 opened about 1 year ago by

chriswritescode

New activity in Valdemardi/DeepSeek-R1-Distill-Llama-70B-AWQ about 1 year ago

AWQ q6

#1 opened about 1 year ago by

D-r-e

New activity in unsloth/DeepSeek-R1-GGUF about 1 year ago

I tested dynamic 1.58bit and 2.22bit, All thoughts are empty?

#24 opened about 1 year ago by

SongXiaoMao

No think tokens visible

#15 opened about 1 year ago by

sudkamath

New activity in Tiiny/SmallThinker-3B-Preview over 1 year ago

How to Pair with Larger Models

#7 opened over 1 year ago by

windkkk

New activity in Qwen/QwQ-32B-Preview over 1 year ago

multi GPU inferencing

#18 opened over 1 year ago by

cjj2003

宋小猫

AI & ML interests

Recent Activity

Organizations

SongXiaoMao's activity

FP8 work for base model or is 16-bit of 27B required?

Is there anyone who can tell me how to run this model with vllm correctly?

Can the big guy quantify this model into MXFP4? Thank you!!

How does the VLLM start this model?

This quantization model is amzing

Why is the file size of 4bit similar to FP8?

Sensitive information is not a question

VLLM 0.18.0 runs with an error

I get an error using vllm0.18.0

使用VLLM启动会报错

Tokenizer class TokenizersBackend does not exist in vllm v0.17.1

Can you make a quantitative model? Qwen3.5-122B-A10B-GPTQ-Int4

When will you fix the model replies missing</think>\n start tags

When answering questions in Chinese, the model frequently terminates prematurely (outputs the end token). Is this a common problem?

missing opening <think>

AWQ q6

I tested dynamic 1.58bit and 2.22bit, All thoughts are empty?

No think tokens visible

How to Pair with Larger Models

multi GPU inferencing