🧠 LydiaTM-Qwen3-14B — TW Reasoning (Merged GGUF)

This repository contains the GGUF export of a fine-tuned Qwen3-14B model, trained using Unsloth and TRL’s SFTTrainer on the high-quality twinkle-ai/tw-reasoning-instruct-50k dataset.
The model specializes in reasoning, chain-of-thought, and instruction following in Chinese and English.

The model is merged: LoRA adapters were fused into the base model before GGUF conversion. It is fully offline and compatible with llama.cpp, LM Studio, KoboldCPP, and Ollama.

📦 Model Summary

Property	Value
Base Model	`unsloth/Qwen3-14B-unsloth-bnb-4bit`
Format	GGUF
Precision	F16
Type	Text-only LLM
Training Method	QLoRA (via Unsloth)
Dataset	`twinkle-ai/tw-reasoning-instruct-50k`
Languages	Chinese (primary), English (secondary)
Specialization	Reasoning, chain-of-thought, instruction

🔧 Fine-Tuning Details

Training Framework

Unsloth for fast & memory-efficient QLoRA fine-tuning
TRL (SFTTrainer) for supervised instruction tuning
BitsAndBytes for 4-bit / 8-bit optimizers
Accelerate for distributed / device management

LoRA Configuration

r: 16
lora_alpha: 16
lora_dropout: 0.0
target_modules:
  q_proj, k_proj, v_proj, o_proj,
  gate_proj, up_proj, down_proj

Sequence Length

max_seq_length = 2048

Note: The underlying Qwen3 architecture still supports up to 128K long-context inference. Fine-tuning was done at 2K sequence length for efficiency, but the model can still be used with larger contexts at inference time.

Training Data Structure

Each example in tw-reasoning-instruct-50k contains:

input – user question / instruction
think – chain-of-thought reasoning
output – final answer
During SFT, the assistant response was formatted roughly as:
{reasoning}
{final_answer}

This encourages better reasoning while still allowing users to suppress chain-of-thought at inference with appropriate prompting.

🔄 GGUF Conversion

The merged HF model was converted using convert-hf-to-gguf.py from llama.cpp, for example:

python convert-hf-to-gguf.py \
  ./qwen3-14b-tw-reasoning-merged \
  --outfile qwen3-14b.F16.gguf \
  --outtype f16

Available file(s): - `qwen3-14b.F16.gguf`

🚀 Usage Examples

llama.cpp (CLI, Hugging Face Hub)

llama-cli --hf <username>/<repo> -p "為什麼天空是藍色的？"

llama.cpp (local file)

llama-cli -m qwen3-14b.F16.gguf -p "Why is the sky blue?"

LM Studio / KoboldCPP / Text-Generation-WebUI

Place qwen3-14b.F16.gguf into your models directory
Load it as a normal GGUF model and start chatting

Ollama

Example Modelfile:

FROM qwen3-14b.F16.gguf
TEMPLATE "<s>[INST] {{ .Prompt }} [/INST]"

Create and run:

ollama create lydia-qwen3-14b -f Modelfile
ollama run lydia-qwen3-14b "解釋量子糾纏"

Example usage:

For text only LLMs: llama-cli --hf repo_id/model_name -p "why is the sky blue?"
For multimodal models: llama-mtmd-cli -m model_name.gguf --mmproj mmproj_file.gguf

📊 Intended Use

Complex reasoning and explanation in Chinese
Instruction following and educational Q&A
Chain-of-thought–friendly tasks
Offline personal assistants and local agents

⚠️ Limitations

May produce verbose chain-of-thought unless instructed otherwise
Can hallucinate or be overconfident outside its training distribution
Not specifically optimized for coding or tool use
GGUF format cannot be further fine-tuned directly

🙌 Acknowledgements

Unsloth for efficient fine-tuning infrastructure
twinkle-ai for the tw-reasoning-instruct-50k dataset
The Qwen team for the Qwen3 architecture
The llama.cpp / GGUF ecosystem and community tooling (LM Studio, KoboldCPP, Ollama, etc.)

Downloads last month: 12

GGUF

Model size

15B params

Architecture

qwen3

Hardware compatibility

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including imhmdf/LydiaTM-qwen3-14b-tw-reasoning-merged

LydiaTM

Collection

Thinking and instruction models for Lydia AI • 6 items • Updated Jan 28