Instructions to use Surpem/Supertron2-1.7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Surpem/Supertron2-1.7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Surpem/Supertron2-1.7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Surpem/Supertron2-1.7B")
model = AutoModelForCausalLM.from_pretrained("Surpem/Supertron2-1.7B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Surpem/Supertron2-1.7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Surpem/Supertron2-1.7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Surpem/Supertron2-1.7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Surpem/Supertron2-1.7B

SGLang

How to use Surpem/Supertron2-1.7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Surpem/Supertron2-1.7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Surpem/Supertron2-1.7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Surpem/Supertron2-1.7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Surpem/Supertron2-1.7B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Surpem/Supertron2-1.7B with Docker Model Runner:
```
docker model run hf.co/Surpem/Supertron2-1.7B
```

Supertron2-1.7B / README.md

Ill-Ness

Update README.md

a8c2293 verified 12 days ago

preview code

raw

history blame contribute delete

3.57 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- Qwen/Qwen3-1.7B
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- reasoning
	- math
	- coding
	- instruction-tuned
	- pytorch
	---
	# Supertron2-1.7B: A Compact, Efficient Instruction-Tuned Language Model
	## Model Description
	Supertron2-1.7B is an instruction-tuned language model built on top of Qwen3-1.7B. Designed to be a reliable, efficient daily driver, it delivers strong performance across math, coding, reasoning, science, general knowledge, and general conversation while remaining lightweight enough to run on consumer hardware.

	* Developed by: Surpem
	* Model type: Causal Language Model
	* Architecture: Dense Transformer, 1.7B parameters
	* Fine-tuned from: [Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B)
	* License: Apache 2.0

	---

	## Capabilities

	### Reasoning
	Supertron2-1.7B is designed for clear multi-step reasoning, making it capable of breaking down complex problems in a structured and useful way. It can work through questions methodically rather than jumping directly to a final answer.

	### Math
	The model handles a range of math tasks, from arithmetic and algebra to word problems and structured problem solving. It is useful for explaining steps, checking calculations, and producing concise final answers.

	### Coding
	Supertron2-1.7B can write, debug, and explain code across popular languages including Python, JavaScript, C++, and more. It understands syntax, common programming patterns, algorithmic reasoning, and practical implementation details.

	### Science & General Knowledge
	Broad instruction tuning across science, STEM, and general knowledge domains means the model can hold technical conversations, explain difficult concepts clearly, and assist with research, writing, and analysis tasks.

	### Instruction Following
	The model is responsive to natural language instructions. Whether you need concise answers, detailed explanations, structured output, or creative writing, Supertron2-1.7B adapts to the format and tone you ask for without needing complex prompting tricks.

	---

	## Get Started
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_id = "Surpem/Supertron2-1.7B"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)

	messages = [
	{"role": "user", "content": "Explain the difference between LoRA and full fine-tuning."}
	]

	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=512)
	print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
	```

	---

	## Hardware Requirements
	\| Precision \| Min VRAM \| Recommended \|
	\|---\|---\|---\|
	\| bfloat16 \| 5 GB \| 8 GB+ \|
	\| 4-bit quantized \| 3 GB \| 4 GB+ \|

	For 4-bit quantized inference:
	```python
	from transformers import BitsAndBytesConfig

	bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)
	model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
	```

	---

	## Citation
	```bibtex
	@misc{surpem2026supertron2-1.7b,
	title={Supertron2-1.7B — Efficient Instruction-Tuned Language Model},
	author={Surpem},
	year={2026},
	url={https://huggingface.co/Surpem/Supertron2-1.7B},
	}
	```