Mini-LLM

Mini-LLM is a project that aims to replicate mainstream open-source model architectures with limited computational resources, implementing mini models with 100-200M parameters. The project focuses on learning and reproducing model architectures while providing complete training and inference pipelines. For more details, please visit the Mini-LLM project.

Usage

Using Transformers Library

First, import the model registration module, then load the model using AutoModelForCausalLM:

import mini_models  # Register custom Mini-LLM models
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained("WKQ9411/Mini-DeepSeekV3-160M-A100M-SFT")
tokenizer = AutoTokenizer.from_pretrained("WKQ9411/Mini-DeepSeekV3-160M-A100M-SFT")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Generate text
messages = [{"role": "user", "content": "你好,你是谁?"}]
formatted_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
input_ids = tokenizer(formatted_text, return_tensors="pt")["input_ids"].to(model.device)
response = model.generate(input_ids, max_new_tokens=100)
response = tokenizer.decode(response[0][len(input_ids[0]):], skip_special_tokens=True)
print(response)

Using Custom Interface

from mini_models import get_model_and_config
from transformers import AutoTokenizer
import torch

Model, Config = get_model_and_config("mini_deepseekv3")
model = Model.from_pretrained("path/to/your/model")
tokenizer = AutoTokenizer.from_pretrained("path/to/your/tokenizer")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Use the model for generation
messages = [{"role": "user", "content": "你好,你是谁?"}]
formatted_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
input_ids = tokenizer(formatted_text, return_tensors="pt")["input_ids"].to(model.device)
response = model.generate(input_ids, max_new_tokens=100)
response = tokenizer.decode(response[0][len(input_ids[0]):], skip_special_tokens=True)
print(response)

Training Data

The model was pre-trained on:

  • 20% sampled subset of OpenCSG Fineweb-Edu-Chinese-V2.1 dataset (high-quality Chinese educational content)
  • DeepCtrl SFT dataset from Modelscope

Limitations

This is a small-scale model designed for educational and research purposes. It may not perform as well as larger models on complex tasks.

Downloads last month
26
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WKQ9411/Mini-DeepSeekV3-160M-A100M-SFT

Finetuned
(2)
this model