YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

base_model: - StevesInfinityDrive/DeepSeek-R1-Distill-Qwen-1.0B library_name: peft

Model Card for DeepSeek-R1-Distill-Qwen-1.0B

Model Details

Model Description

DeepSeek-R1-Distill-Qwen-1.0B is a distilled version of the DeepSeek-R1 model, designed for efficiency while maintaining strong performance across various NLP tasks. The model has been fine-tuned using PEFT (Parameter Efficient Fine-Tuning) to optimize its ability for downstream applications, particularly in chatbot interactions, document summarization, and contextual understanding.

  • Developed by: StevesInfinityDrive
  • Funded by [optional]: [More Information Needed]
  • Shared by [optional]: StevesInfinityDrive
  • Model type: Distilled Transformer-based Language Model
  • Language(s) (NLP): English, Chinese (potential multilingual capability)
  • License: [More Information Needed]
  • Finetuned from model [optional]: DeepSeek-R1-Qwen-1.0B

Model Sources

  • Repository: [More Information Needed]
  • Paper [optional]: [More Information Needed]
  • Demo [optional]: [More Information Needed]

Uses

Direct Use

This model can be used directly for NLP tasks such as:

  • Chatbot applications
  • Summarization and content generation
  • Code completion and assistance
  • Sentiment analysis
  • General language understanding tasks

Downstream Use

Fine-tuning this model with PEFT allows for customization in:

  • Domain-specific NLP applications (e.g., legal, medical, finance)
  • Personalized AI assistants
  • Specialized chatbots

Out-of-Scope Use

  • Not suitable for real-time high-accuracy requirements without further fine-tuning.
  • Should not be used for generating biased, unethical, or misleading content.
  • The model may have limitations in highly technical or niche domain-specific queries.

Bias, Risks, and Limitations

  • The model may exhibit biases present in its training data.
  • It may generate hallucinated or incorrect responses.
  • Limited interpretability in decision-making processes.

Recommendations

Users should carefully evaluate outputs, especially in critical applications such as healthcare, law, or finance. Mitigating bias through additional fine-tuning and prompt engineering is recommended.

How to Get Started with the Model

Use the following code to load the model:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "StevesInfinityDrive/DeepSeek-R1-Distill-Qwen-1.0B"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model)

peft_model = PeftModel.from_pretrained(model, base_model)

## Training Data

- Distilled using knowledge distillation techniques from DeepSeek-R1-Qwen-1.0B.
- Includes a mixture of publicly available datasets for NLP tasks.
- Additional details on preprocessing and fine-tuning pipelines are needed.

## Training Procedure

### Preprocessing

- Standard NLP tokenization and text cleaning applied.

### Training Hyperparameters

- Precision: fp16 mixed precision
- Batch Size: [More Information Needed]
- Learning Rate: [More Information Needed]
- Training Steps: [More Information Needed]

### Speeds, Sizes, Times

- Model size: 1B parameters

## Evaluation

### Testing Data, Factors & Metrics

## Testing Data

- Evaluated on standard NLP benchmarks

### Factors

- Performance varies depending on the prompt complexity and domain
- Bias detection and mitigation strategies are still under analysis

### Metrics

- Perplexity
- BLEU Score
- F1 Score (for classification tasks)

### Summary

The model shows strong performance for general NLP tasks but may require domain-specific fine-tuning for optimal performance.

## Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

## Technical Specifications

### Model Architecture and Objective

- Transformer-based autoregressive model
- Optimized for inference efficiency through distillation
- Parameter-efficient fine-tuning via PEFT

## Compute Infrastructure

### Software

Framework: PyTorch
Libraries: transformers, peft, accelerate
Compatible with Hugging Face API
Downloads last month
7
Safetensors
Model size
0.9B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support