🧠 LydiaTM-Qwen3-14B β€” TW Reasoning (Merged GGUF)

This repository contains the GGUF export of a fine-tuned Qwen3-14B model, trained using Unsloth and TRL’s SFTTrainer on the high-quality twinkle-ai/tw-reasoning-instruct-50k dataset.
The model specializes in reasoning, chain-of-thought, and instruction following in Chinese and English.

The model is merged: LoRA adapters were fused into the base model before GGUF conversion. It is fully offline and compatible with llama.cpp, LM Studio, KoboldCPP, and Ollama.


πŸ“¦ Model Summary

Property Value
Base Model unsloth/Qwen3-14B-unsloth-bnb-4bit
Format GGUF
Precision F16
Type Text-only LLM
Training Method QLoRA (via Unsloth)
Dataset twinkle-ai/tw-reasoning-instruct-50k
Languages Chinese (primary), English (secondary)
Specialization Reasoning, chain-of-thought, instruction

πŸ”§ Fine-Tuning Details

Training Framework

  • Unsloth for fast & memory-efficient QLoRA fine-tuning
  • TRL (SFTTrainer) for supervised instruction tuning
  • BitsAndBytes for 4-bit / 8-bit optimizers
  • Accelerate for distributed / device management

LoRA Configuration

r: 16
lora_alpha: 16
lora_dropout: 0.0
target_modules:
  q_proj, k_proj, v_proj, o_proj,
  gate_proj, up_proj, down_proj

Sequence Length

max_seq_length = 2048

Note: The underlying Qwen3 architecture still supports up to 128K long-context inference. Fine-tuning was done at 2K sequence length for efficiency, but the model can still be used with larger contexts at inference time.

Training Data Structure

Each example in tw-reasoning-instruct-50k contains:

  • input – user question / instruction

  • think – chain-of-thought reasoning

  • output – final answer
    During SFT, the assistant response was formatted roughly as:

    {reasoning}

    {final_answer}

This encourages better reasoning while still allowing users to suppress chain-of-thought at inference with appropriate prompting.


πŸ”„ GGUF Conversion

The merged HF model was converted using convert-hf-to-gguf.py from llama.cpp, for example:

python convert-hf-to-gguf.py \
  ./qwen3-14b-tw-reasoning-merged \
  --outfile qwen3-14b.F16.gguf \
  --outtype f16

Available file(s): - qwen3-14b.F16.gguf

πŸš€ Usage Examples

llama.cpp (CLI, Hugging Face Hub)

llama-cli --hf <username>/<repo> -p "η‚Ίδ»€ιΊΌε€©η©Ίζ˜―θ—θ‰²ηš„οΌŸ"

llama.cpp (local file)

llama-cli -m qwen3-14b.F16.gguf -p "Why is the sky blue?"

LM Studio / KoboldCPP / Text-Generation-WebUI

  • Place qwen3-14b.F16.gguf into your models directory
  • Load it as a normal GGUF model and start chatting

Ollama

Example Modelfile:

FROM qwen3-14b.F16.gguf
TEMPLATE "<s>[INST] {{ .Prompt }} [/INST]"

Create and run:

ollama create lydia-qwen3-14b -f Modelfile
ollama run lydia-qwen3-14b "解釋量子糾纏"

Example usage:

  • For text only LLMs: llama-cli --hf repo_id/model_name -p "why is the sky blue?"
  • For multimodal models: llama-mtmd-cli -m model_name.gguf --mmproj mmproj_file.gguf

πŸ“Š Intended Use

  • Complex reasoning and explanation in Chinese
  • Instruction following and educational Q&A
  • Chain-of-thought–friendly tasks
  • Offline personal assistants and local agents

⚠️ Limitations

  • May produce verbose chain-of-thought unless instructed otherwise
  • Can hallucinate or be overconfident outside its training distribution
  • Not specifically optimized for coding or tool use
  • GGUF format cannot be further fine-tuned directly

πŸ™Œ Acknowledgements

  • Unsloth for efficient fine-tuning infrastructure
  • twinkle-ai for the tw-reasoning-instruct-50k dataset
  • The Qwen team for the Qwen3 architecture
  • The llama.cpp / GGUF ecosystem and community tooling (LM Studio, KoboldCPP, Ollama, etc.)
Downloads last month
12
GGUF
Model size
15B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including imhmdf/LydiaTM-qwen3-14b-tw-reasoning-merged