base_model: - StevesInfinityDrive/DeepSeek-R1-Distill-Qwen-1.0B library_name: peft

Model Card for DeepSeek-R1-Distill-Qwen-1.0B

Model Details

Model Description

DeepSeek-R1-Distill-Qwen-1.0B is a distilled version of the DeepSeek-R1 model, designed for efficiency while maintaining strong performance across various NLP tasks. The model has been fine-tuned using PEFT (Parameter Efficient Fine-Tuning) to optimize its ability for downstream applications, particularly in chatbot interactions, document summarization, and contextual understanding.

Developed by: StevesInfinityDrive
Funded by [optional]: [More Information Needed]
Shared by [optional]: StevesInfinityDrive
Model type: Distilled Transformer-based Language Model
Language(s) (NLP): English, Chinese (potential multilingual capability)
License: [More Information Needed]
Finetuned from model [optional]: DeepSeek-R1-Qwen-1.0B

Model Sources

Repository: [More Information Needed]
Paper [optional]: [More Information Needed]
Demo [optional]: [More Information Needed]

Uses

Direct Use

This model can be used directly for NLP tasks such as:

Chatbot applications
Summarization and content generation
Code completion and assistance
Sentiment analysis
General language understanding tasks

Downstream Use

Fine-tuning this model with PEFT allows for customization in:

Domain-specific NLP applications (e.g., legal, medical, finance)
Personalized AI assistants
Specialized chatbots

Out-of-Scope Use

Not suitable for real-time high-accuracy requirements without further fine-tuning.
Should not be used for generating biased, unethical, or misleading content.
The model may have limitations in highly technical or niche domain-specific queries.

Bias, Risks, and Limitations

The model may exhibit biases present in its training data.
It may generate hallucinated or incorrect responses.
Limited interpretability in decision-making processes.

Recommendations

Users should carefully evaluate outputs, especially in critical applications such as healthcare, law, or finance. Mitigating bias through additional fine-tuning and prompt engineering is recommended.

How to Get Started with the Model

Use the following code to load the model:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "StevesInfinityDrive/DeepSeek-R1-Distill-Qwen-1.0B"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model)

peft_model = PeftModel.from_pretrained(model, base_model)

## Training Data

- Distilled using knowledge distillation techniques from DeepSeek-R1-Qwen-1.0B.
- Includes a mixture of publicly available datasets for NLP tasks.
- Additional details on preprocessing and fine-tuning pipelines are needed.

## Training Procedure

### Preprocessing

- Standard NLP tokenization and text cleaning applied.

### Training Hyperparameters

- Precision: fp16 mixed precision
- Batch Size: [More Information Needed]
- Learning Rate: [More Information Needed]
- Training Steps: [More Information Needed]

### Speeds, Sizes, Times

- Model size: 1B parameters

## Evaluation

### Testing Data, Factors & Metrics

## Testing Data

- Evaluated on standard NLP benchmarks

### Factors

- Performance varies depending on the prompt complexity and domain
- Bias detection and mitigation strategies are still under analysis

### Metrics

- Perplexity
- BLEU Score
- F1 Score (for classification tasks)

### Summary

The model shows strong performance for general NLP tasks but may require domain-specific fine-tuning for optimal performance.

## Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

## Technical Specifications

### Model Architecture and Objective

- Transformer-based autoregressive model
- Optimized for inference efficiency through distillation
- Parameter-efficient fine-tuning via PEFT

## Compute Infrastructure

### Software

Framework: PyTorch
Libraries: transformers, peft, accelerate
Compatible with Hugging Face API

Downloads last month: 7

Safetensors

Model size

0.9B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support