deepseek-ai/DeepSeek-R1

#185 opened about 1 year ago by

EdilCamil

Holding paper in hand

#184 opened about 1 year ago by

Loveyl

Update config.json

#182 opened about 1 year ago by

Empolean2640

Regression in Reasoning Tag Output - Missing <think> in Model Responses

#181 opened about 1 year ago by

divinerapier

Delete model.safetensors.index.json

#180 opened about 1 year ago by

Huggingfaceliaj

Unknown quantization type, got fp8

#179 opened about 1 year ago by

DenisFavaCerchiaro

如何取消/省略<think></think>过程。

#178 opened about 1 year ago by

yech520

Request: DOI

🤗 1

#177 opened about 1 year ago by

Tamwyn

Request: DOI

#176 opened about 1 year ago by

saathwik

Request: DOI

#175 opened about 1 year ago by

Paulabad

Draft model as accelerator for DeepSeek-R1?

#174 opened about 1 year ago by

inputout

Deploying production ready service with Unsloth GGUF quants on your AWS account. (4 x L40S)

🔥 2

8

#171 opened about 1 year ago by

samagra-tensorfuse

是否可以关注Perplexity推出的“r1-1776”模型？

#170 opened about 1 year ago by

yanyihan

Just crossed 10,000 likes!

#169 opened about 1 year ago by

clem

mac上面无法下载flash_attn

#168 opened about 1 year ago by

earlyIsLate

Can this model be used for commercial use?

#167 opened about 1 year ago by

henrycwf

90+ tokens per second for MI300x8 using batch_size = 1

#166 opened about 1 year ago by

ghostplant

RytryR1

#165 opened about 1 year ago by

Rocka01

"aha moment" comment deleted by Perplexity (recovered)

👍 1

#164 opened about 1 year ago by

FalconNet

输出乱码

#163 opened about 1 year ago by

cell22

'num_hidden_layers': 61, but layer 62 has weights.

#162 opened about 1 year ago by

xinhe

Upload GTG Breaking every Limit

#161 opened about 1 year ago by

GTGenesis

support prefix complete

❤️👍 3

#158 opened about 1 year ago by

HuggineAllen

Create app.py

#157 opened about 1 year ago by

SpaceAgeRobotics

Create 1

#156 opened about 1 year ago by

madevii

Brokersponsor

#155 opened about 1 year ago by

Brokersponsor

Update README.md

#154 opened about 1 year ago by

egegvner

Upload IMG_4530.png

#152 opened about 1 year ago by

Noemie202586

Upload IMG_1745.JPG

#151 opened about 1 year ago by

Ladib

Create Clara

#150 opened about 1 year ago by

Clblinks

If I understand correctly, evaluating MATH-500 requires 64*500 model calls?

#149 opened about 1 year ago by

Rorschaaaach

Request: DOI

🚀 1

#148 opened about 1 year ago by

Tarush-Appreciate

Update README.md

#147 opened about 1 year ago by

tekno-power

Update README.md

#146 opened about 1 year ago by

Ekimnedops6969

Update README.md

❤️ 1

#143 opened about 1 year ago by

MuhammadEhsan

Request for Information on Purchasing Reasoning API Key

#142 opened about 1 year ago by

brahamaandai

ssss

🔥 1

#140 opened about 1 year ago by

DZGT

Update model_max_length in tokenizer_config.json

👍 3

#139 opened about 1 year ago by

kkokkie2360

Host of the model

#138 opened about 1 year ago by

henrycwf

Lite version for DeepSeek-R1?

👍👀 6

#137 opened about 1 year ago by

haili-tian

[Bug] assert not self.training

#136 opened about 1 year ago by

Gaie

Upload IMG_0253.HEIC

#134 opened about 1 year ago by

rynty

Upload comment-sample.xlsx

#133 opened about 1 year ago by

faham123

non-reasoning data

#132 opened about 1 year ago by

mccatec

能不能放一些 4bit的权重，现在手里面的卡都不支持FP8

🔥 2

#131 opened about 1 year ago by

zhnagchenchne

For the universe! DeepPhaser.py DeepCoralX.py and DeepSynapse.py

❤️👀 2

#129 opened about 1 year ago by

karmikovic

Request: Create distill of Mistral Small 24B

#128 opened about 1 year ago by

Kenshiro-28

which vision model is R1 using for text extraction from image or pdfs.