π§ LydiaTM-Qwen3-14B β TW Reasoning (Merged GGUF)
This repository contains the GGUF export of a fine-tuned Qwen3-14B model, trained using Unsloth and TRLβs SFTTrainer on the high-quality twinkle-ai/tw-reasoning-instruct-50k dataset.
The model specializes in reasoning, chain-of-thought, and instruction following in Chinese and English.
The model is merged: LoRA adapters were fused into the base model before GGUF conversion. It is fully offline and compatible with llama.cpp, LM Studio, KoboldCPP, and Ollama.
π¦ Model Summary
| Property | Value |
|---|---|
| Base Model | unsloth/Qwen3-14B-unsloth-bnb-4bit |
| Format | GGUF |
| Precision | F16 |
| Type | Text-only LLM |
| Training Method | QLoRA (via Unsloth) |
| Dataset | twinkle-ai/tw-reasoning-instruct-50k |
| Languages | Chinese (primary), English (secondary) |
| Specialization | Reasoning, chain-of-thought, instruction |
π§ Fine-Tuning Details
Training Framework
- Unsloth for fast & memory-efficient QLoRA fine-tuning
- TRL (SFTTrainer) for supervised instruction tuning
- BitsAndBytes for 4-bit / 8-bit optimizers
- Accelerate for distributed / device management
LoRA Configuration
r: 16
lora_alpha: 16
lora_dropout: 0.0
target_modules:
q_proj, k_proj, v_proj, o_proj,
gate_proj, up_proj, down_proj
Sequence Length
max_seq_length = 2048
Note: The underlying Qwen3 architecture still supports up to 128K long-context inference. Fine-tuning was done at 2K sequence length for efficiency, but the model can still be used with larger contexts at inference time.
Training Data Structure
Each example in tw-reasoning-instruct-50k contains:
inputβ user question / instructionthinkβ chain-of-thought reasoning
{reasoning}outputβ final answer
During SFT, the assistant response was formatted roughly as:{final_answer}
This encourages better reasoning while still allowing users to suppress chain-of-thought at inference with appropriate prompting.
π GGUF Conversion
The merged HF model was converted using convert-hf-to-gguf.py from llama.cpp, for example:
python convert-hf-to-gguf.py \
./qwen3-14b-tw-reasoning-merged \
--outfile qwen3-14b.F16.gguf \
--outtype f16
Available file(s):
- qwen3-14b.F16.gguf
π Usage Examples
llama.cpp (CLI, Hugging Face Hub)
llama-cli --hf <username>/<repo> -p "ηΊδ»ιΊΌε€©η©Ίζ―θθ²ηοΌ"
llama.cpp (local file)
llama-cli -m qwen3-14b.F16.gguf -p "Why is the sky blue?"
LM Studio / KoboldCPP / Text-Generation-WebUI
- Place
qwen3-14b.F16.ggufinto your models directory - Load it as a normal GGUF model and start chatting
Ollama
Example Modelfile:
FROM qwen3-14b.F16.gguf
TEMPLATE "<s>[INST] {{ .Prompt }} [/INST]"
Create and run:
ollama create lydia-qwen3-14b -f Modelfile
ollama run lydia-qwen3-14b "θ§£ιιεη³ΎηΊ"
Example usage:
- For text only LLMs: llama-cli --hf repo_id/model_name -p "why is the sky blue?"
- For multimodal models: llama-mtmd-cli -m model_name.gguf --mmproj mmproj_file.gguf
π Intended Use
- Complex reasoning and explanation in Chinese
- Instruction following and educational Q&A
- Chain-of-thoughtβfriendly tasks
- Offline personal assistants and local agents
β οΈ Limitations
- May produce verbose chain-of-thought unless instructed otherwise
- Can hallucinate or be overconfident outside its training distribution
- Not specifically optimized for coding or tool use
- GGUF format cannot be further fine-tuned directly
π Acknowledgements
- Unsloth for efficient fine-tuning infrastructure
- twinkle-ai for the
tw-reasoning-instruct-50kdataset - The Qwen team for the Qwen3 architecture
- The llama.cpp / GGUF ecosystem and community tooling (LM Studio, KoboldCPP, Ollama, etc.)
- Downloads last month
- 12
16-bit