Instructions to use bluejude10/Smoothie-Qwen3-4B-DTRO-Edition with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use bluejude10/Smoothie-Qwen3-4B-DTRO-Edition with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="bluejude10/Smoothie-Qwen3-4B-DTRO-Edition")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("bluejude10/Smoothie-Qwen3-4B-DTRO-Edition", dtype="auto")

llama-cpp-python

How to use bluejude10/Smoothie-Qwen3-4B-DTRO-Edition with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="bluejude10/Smoothie-Qwen3-4B-DTRO-Edition",
	filename="Smoothie-Qwen3-4B-DTRO-Edition-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use bluejude10/Smoothie-Qwen3-4B-DTRO-Edition with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf bluejude10/Smoothie-Qwen3-4B-DTRO-Edition:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf bluejude10/Smoothie-Qwen3-4B-DTRO-Edition:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf bluejude10/Smoothie-Qwen3-4B-DTRO-Edition:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf bluejude10/Smoothie-Qwen3-4B-DTRO-Edition:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf bluejude10/Smoothie-Qwen3-4B-DTRO-Edition:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf bluejude10/Smoothie-Qwen3-4B-DTRO-Edition:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf bluejude10/Smoothie-Qwen3-4B-DTRO-Edition:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf bluejude10/Smoothie-Qwen3-4B-DTRO-Edition:Q4_K_M

Use Docker

docker model run hf.co/bluejude10/Smoothie-Qwen3-4B-DTRO-Edition:Q4_K_M

LM Studio
Jan

vLLM

How to use bluejude10/Smoothie-Qwen3-4B-DTRO-Edition with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "bluejude10/Smoothie-Qwen3-4B-DTRO-Edition"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bluejude10/Smoothie-Qwen3-4B-DTRO-Edition",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/bluejude10/Smoothie-Qwen3-4B-DTRO-Edition:Q4_K_M

SGLang

How to use bluejude10/Smoothie-Qwen3-4B-DTRO-Edition with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "bluejude10/Smoothie-Qwen3-4B-DTRO-Edition" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bluejude10/Smoothie-Qwen3-4B-DTRO-Edition",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "bluejude10/Smoothie-Qwen3-4B-DTRO-Edition" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bluejude10/Smoothie-Qwen3-4B-DTRO-Edition",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use bluejude10/Smoothie-Qwen3-4B-DTRO-Edition with Ollama:
```
ollama run hf.co/bluejude10/Smoothie-Qwen3-4B-DTRO-Edition:Q4_K_M
```

Unsloth Studio new

How to use bluejude10/Smoothie-Qwen3-4B-DTRO-Edition with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for bluejude10/Smoothie-Qwen3-4B-DTRO-Edition to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for bluejude10/Smoothie-Qwen3-4B-DTRO-Edition to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for bluejude10/Smoothie-Qwen3-4B-DTRO-Edition to start chatting

Pi new

How to use bluejude10/Smoothie-Qwen3-4B-DTRO-Edition with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf bluejude10/Smoothie-Qwen3-4B-DTRO-Edition:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "bluejude10/Smoothie-Qwen3-4B-DTRO-Edition:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use bluejude10/Smoothie-Qwen3-4B-DTRO-Edition with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf bluejude10/Smoothie-Qwen3-4B-DTRO-Edition:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default bluejude10/Smoothie-Qwen3-4B-DTRO-Edition:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use bluejude10/Smoothie-Qwen3-4B-DTRO-Edition with Docker Model Runner:
```
docker model run hf.co/bluejude10/Smoothie-Qwen3-4B-DTRO-Edition:Q4_K_M
```

Lemonade

How to use bluejude10/Smoothie-Qwen3-4B-DTRO-Edition with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull bluejude10/Smoothie-Qwen3-4B-DTRO-Edition:Q4_K_M

Run and chat with the model

lemonade run user.Smoothie-Qwen3-4B-DTRO-Edition-Q4_K_M

List all available models

lemonade list

Smoothie-Qwen3-4B-DTRO-Edition (Private)

대구교통공사 3호선 전력설비 전문가 AI 모델 - 경량화 버전 (Lightweight Edition)
Fine-tuned for Daegu Metro Line 3 Power Facility Expert System

🔴 이 모델은 Private 레포지토리입니다.
업무용 내부 자료 학습으로 인해 비공개로 유지됩니다.

📌 모델 개요

기본 정보

베이스 모델: dnotitia/Smoothie-Qwen3-4B
파인튜닝 방식: QLoRA (4-bit)
현재 버전: v3.0 (Refined Dataset + Balanced LoRA)
학습 데이터: 대구교통공사 전력설비 관련 QnA 1,273건 (정제판)
모델 크기: 약 2.4GB (Q4_K_M 양자화)
튜닝 전략: Balanced Mode (데이터 정제 + 균형잡힌 LoRA)

🎯 프로젝트 목표

8B 모델보다 가볍고 빠른 저사양 환경용 경량 모델 확보.

✅ 빠른 응답 속도 (8B 대비 약 2배)
✅ 낮은 VRAM 요구량 (2.4GB)
⚠️ 답변 품질은 8B보다 떨어질 수 있음

🆚 8B vs 4B 비교

항목	8B (Original)	4B (Lightweight)
모델 크기	5.0GB (Q4_K_M)	2.4GB (Q4_K_M)
VRAM 요구량	~6GB	~3GB
추론 속도	보통	매우 빠름 (약 2배)
답변 품질	매우 우수	우수 (일부 제한)
용도	복잡한 분석, 정밀 조치법	빠른 조회, 간단한 Q&A
LoRA Rank	16	4 (균형 모드)
Learning Rate	2e-4	2e-5 (안정적 학습)

🔧 파인튜닝 상세

v3.0 하이퍼파라미터 (현재 버전)

# Balanced Mode 설정 (정제 데이터 + 균형잡힌 LoRA)
LORA_R = 4              # v1: 8 → v2: 2 → v3: 4 (최적 균형점)
LORA_ALPHA = 8          # r의 2배
LEARNING_RATE = 2e-5    # 안정적 학습
NUM_EPOCHS = 3          # 4B 모델 적정 에폭
BATCH_SIZE = 1
GRADIENT_ACCUMULATION = 16

학습 과정 히스토리

v1.0 (Rank 8, LR 1e-4) - 실패:
- "저항 저항 저항..." 무한 반복 현상
- Catastrophic Forgetting (기본 언어 능력 상실)
v2.0 (Rank 2, LR 2e-5) - 부분 성공:
- 무한 반복 대폭 감소
- 하지만 "라인테스트블로킹" 질문에서 여전히 반복
- 원인: 데이터셋에 "Line test" 반복 패턴 존재
v3.0 (Rank 4, LR 2e-5) - 현재 버전:
- ✅ 데이터셋 정제: "Line test" 반복 패턴 5개 제거
- ✅ Rank 4로 상향: 학습 능력과 안정성 균형
- ✅ 대부분의 반복 문제 해결
- ⚠️ 일부 질문(테이블 관련 3개 항목)에서 소수 반복 잔존

⚠️ 알려진 제한사항 (v3.0)

1. 특정 질문에서 단어 반복 (4B 모델의 근본 한계)

v3.0에서 데이터셋을 정제하고 Rank 4로 재학습했음에도, 4B 모델의 용량 한계로 인해 일부 질문에서 소수의 반복이 발생합니다.

반복 발생 사례 (테이블 항목 중 약 3개):

"항온항습기 이상" 질문: "cooling coil might be..." 반복
"강궤도빔 열선장치" 질문: "통신채널의정비는..." 반복
기타 복잡한 기술 설명 질문에서 드물게 발생

근본 원인:

4B 모델은 8B 대비 매개변수가 절반으로, Stop Token 생성 능력이 약함
복잡한 기술 용어나 긴 설명에서 반복 패턴 학습 경향
데이터셋 정제로 80% 이상 개선되었으나, 모델 크기로 인한 한계는 남음

대응 방법:

✅ Modelfile의 repeat_penalty 조절 (현재 1.2)
✅ max_tokens 제한 설정
✅ 중요: 복잡한 질문은 8B 모델 사용 권장

2. 영어 추론(Thinking) 노출

4B 모델은 <think> 태그를 완벽하게 제어하지 못해, 답변 중간에 영어로 된 추론 과정이 섞여 나올 수 있습니다.

대응 방법:

Modelfile의 temperature 낮추기 (0.5 → 0.3)
시스템 프롬프트에 "한글로만 답변" 명시

3. 복잡한 추론 한계

8B 모델보다 매개변수가 적어, 다단계 추론이나 복합적인 장애 분석에는 한계가 있습니다.

권장 사용 케이스:

✅ 간단한 장애 조치 방법 조회
✅ 설비 명칭/용어 설명
✅ 수치 기반 정보 확인 (전압, 저항값 등)
❌ 복합 장애 원인 분석 → 8B 사용
❌ 다단계 절차 도출 → 8B 사용
❌ 복잡한 테이블 기반 분석 → 8B 사용

🚀 사용 방법

1. Ollama 사용 (권장)

Modelfile 설정 (v3.0):

FROM ./Smoothie-Qwen3-4B-DTRO-Edition-v3.0-Q4_K_M.gguf

PARAMETER temperature 0.5
PARAMETER top_p 0.8
PARAMETER repeat_penalty 1.2
PARAMETER presence_penalty 0.6
PARAMETER frequency_penalty 0.6
PARAMETER num_ctx 8192
PARAMETER num_predict 4096

SYSTEM """당신은 대구교통공사 3호선 전력설비 전문가입니다.
답변은 반복하지 않고 간결하게 작성합니다.
모든 응답은 순수 한글로 작성합니다."""

실행:

ollama create my-4b-model -f Modelfile
ollama run my-4b-model "K50 고장 시 조치 방법은?"

2. Python (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "bluejude10/Smoothie-Qwen3-4B-DTRO-Edition",
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "bluejude10/Smoothie-Qwen3-4B-DTRO-Edition",
    trust_remote_code=True
)

messages = [
    {"role": "system", "content": "당신은 전력설비 전문가입니다."},
    {"role": "user", "content": "PLC 통신 이상 시 조치법은?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.5)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📊 성능 (Informal Benchmark)

질문 유형	8B 품질	4B 품질	비고
간단한 조치법	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	핵심은 제공
수치 정보	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	정확도 동일
복합 분석	⭐⭐⭐⭐⭐	⭐⭐⭐	깊이 부족
속도	⭐⭐⭐	⭐⭐⭐⭐⭐	2배 빠름
안정성	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	가끔 혼잣말

📁 레포지토리 구성

.
├── README.md
├── WORK_LOG_4B.md                                  # 파인튜닝 작업 일지
├── Smoothie-Qwen3-4B-DTRO-Edition-v3.0-Q4_K_M.gguf # 최신 모델
├── Modelfile.dtro.4b
├── finetune_qlora_4b.py
├── merge_lora.py
└── model_comparison_report.md (참고: 8B와 비교)

🔐 보안 및 비공개 정책

이 모델은 다음 이유로 Private 레포지토리로 유지됩니다:

학습 데이터에 대구교통공사 내부 업무 자료가 포함됨
장애 이력, 조치법 등 민감 정보 포함 가능성
경량 모델의 출력 불안정성 (추론 노출 등)으로 인해 공개 배포 부적합

📜 라이선스

모델: Apache 2.0 (Base Model 라이선스 승계)
데이터: Private (비공개)

👤 개발자

개발자: 강동우 (bluejude10)
연락처: (비공개)
용도: 대구교통공사 3호선 전력설비 장애 대응 보조

📚 참고 자료

기술 문서

⚠️ 주의사항: 이 모델은 실험적 경량 모델이며, v3.0에서 데이터 정제와 재학습으로 개선되었으나 4B 모델의 근본적 한계가 있습니다.

일부 질문(약 3개 항목)에서 단어 반복이 발생할 수 있음
복잡한 분석이나 안전 관련 작업 시 반드시 8B 모델 사용 또는 전문가 확인 필수
실무 투입 전 충분한 검증 권장

Downloads last month: 3

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

4-bit

Model tree for bluejude10/Smoothie-Qwen3-4B-DTRO-Edition

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B