Qwen3-VL-4B-Instruct-FT-SFT: The "Iron Logic" Prototype 🧠
"Order arising from Chaos." — A paradigm shift in LLM Instruction Tuning.
Model Author: aifeifei798
Base Model: Qwen/Qwen3-VL-4B-Instruct
Training Method: Fragmented Training (FT)
FT Dataset: TeichAI/gemini-3-pro-preview-high-reasoning-250x
SFT Dataset: TeichAI/gemini-3-pro-preview-high-reasoning-1000x
Thanks mradermacher: For creating the GGUF versions of these models
https://huggingface.co/mradermacher/Qwen3-VL-4B-Instruct-FT-SFT-GGUF
https://huggingface.co/mradermacher/Qwen3-VL-4B-Instruct-FT-SFT-i1-GGUF
🌟 Introduction: The "FT" Revolution
Qwen3-VL-4B-Instruct-FT-SFT is not just another fine-tune. It is the first public proof-of-concept for the Fragmented Training (FT) paradigm.
Trained on only 248 examples with extreme noise injection, this model demonstrates that Logic can be decoupled from Syntax. It achieves what was previously thought to require thousands of samples: Emergent, Autonomous Chain-of-Thought (CoT).
This model represents Stage 2 of the Iron Logic Pipeline:
- Base Model
- ➡️ FT (Logic Injection) [THIS MODEL]
- Standard SFT
Note: As a raw "Logic Injection" checkpoint, this model exhibits powerful reasoning but retains some artifacts (see Limitations). It is intended to demonstrate the sheer power of the FT technique.
⚡ The "Fragmented Training" (FT) Technology
Why does this model work with only 248 examples?
Standard SFT relies on Linear Pattern Matching (predicting token $t$ based on $t-1$). This often leads to rote memorization. Fragmented Training (FT) destroys this linearity.
The "Cognitive Burden" Protocol
- 70% Input Shuffling: We randomly shuffled 70% of the tokens in the User Instruction and Context.
- Pristine Output: The model was forced to generate a perfect, logical CoT response from this "chaos."
Why FT is a Game Changer:
- Global Semantic Reconstruction: The model cannot rely on grammar or position. It must scan the entire context to extract semantic intent. This creates "Iron Logic" that is robust to noise.
- Extreme Data Efficiency: Under this cognitive load, 1 FT sample $\approx$ 100 SFT samples. The gradients generated are far richer, allowing the model to "grok" complex reasoning patterns (like CoT) instantly.
- Confidence Sharpening: The model's probability distribution becomes extremely decisive, leading to a ~30% speedup in inference time for non-thinking tasks.
⚠️ Limitations & The "Context Bleeding" Artifact
Because this model is the result of Stage 2 (Logic Injection), it has learned that "Global Semantics > Local Position." This leads to a unique side effect:
- Context Bleeding: The model may pull strong semantic concepts from previous unrelated context (e.g., mentioning "pipes" from a previous game when coding a new one).
- Logic Loops: In the
<think>phase, the model may occasionally repeat its reasoning steps (as seen in the demo above). - Why? The FT process taught the model to aggressively hunt for semantic keywords anywhere in the input.
Think of this model as a raw diamond: The hardest part (Logic) is done; the polishing (SFT) comes next.
💻 Usage Code
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
# default: Load the model on the available device(s)
model = Qwen3VLForConditionalGeneration.from_pretrained(
"aifeifei798/Qwen3-VL-4B-Instruct-FT-SFT", dtype="auto", device_map="auto"
)
# We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
# model = Qwen3VLForConditionalGeneration.from_pretrained(
# "aifeifei798/Qwen3-VL-4B-Instruct-FT-SFT",
# dtype=torch.bfloat16,
# attn_implementation="flash_attention_2",
# device_map="auto",
# )
processor = AutoProcessor.from_pretrained("aifeifei798/Qwen3-VL-4B-Instruct-FT-SFT")
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
},
{"type": "text", "text": "Describe this image."},
],
}
]
# Preparation for inference
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt"
)
inputs = inputs.to(model.device)
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
📜 Citation
The Fragmented Training methodology is the core contribution of this work. If you use this technique or model, please cite:
@misc{aifeifei_2026,
author = { aifeifei },
title = { Fragmented-Training (Revision bb381c6) },
year = 2026,
url = { https://huggingface.co/aifeifei798/Fragmented-Training },
doi = { 10.57967/hf/7592 },
publisher = { Hugging Face }
}
Created with the Iron Logic Pipeline by aifeifei798.
Disclaimer and User Agreement
- Introduction
Thank you for your interest in accessing this model (“the Model”).
Before you access, download, or use the Model or any derivative works, please read and understand this Disclaimer and User Agreement (“Agreement”).
By checking “I have read and agree” and accessing the Model, you acknowledge that you have read, understood, and agreed to all terms of this Agreement.
If you do not agree with any part of this Agreement, do not request or use the Model.
- Nature of the Model & Risk Notice
The Model is trained using large-scale machine learning techniques and may generate inaccurate, false, offensive, violent, sexual, discriminatory, politically sensitive, or otherwise uncontrolled content.
The Model does not guarantee the accuracy, completeness, or legality of any generated content. You must independently evaluate and verify the outputs, and you assume all risks arising from their use.
The Model may reflect biases or errors present in its training data, potentially producing inappropriate or controversial outputs.
- License and Permitted Use
You may use the Model solely for lawful, compliant, and non-malicious purposes in research, learning, experimentation, and development, in accordance with applicable laws and regulations.
You must not use the Model for activities including, but not limited to:
Creating, distributing, or promoting unlawful, violent, pornographic, terrorist, discriminatory, defamatory, or privacy-invasive content;
Any activity that could cause significant negative impact on individuals, groups, organizations, or society;
High-risk applications such as automated decision-making, medical diagnosis, financial transactions, or legal advice without proper validation and human oversight.
You must not remove, alter, or circumvent any safety mechanisms implemented in the Model.
- Data and Privacy
You are solely responsible for any data processed or generated when using the Model, including compliance with data protection and privacy regulations.
The Model’s authors and contributors make no guarantees or warranties regarding data security or privacy.
- Limitation of Liability
To the maximum extent permitted by applicable law, the authors, contributors, and their affiliated institutions shall not be liable for any direct, indirect, incidental, or consequential damages arising from the use of the Model.
You agree to bear full legal responsibility for any disputes, claims, or litigation arising from your use of the Model, and you release the authors and contributors from any related liability.
- Updates and Termination
This Agreement may be updated at any time, with updates posted on the Model’s page and effective immediately upon publication.
If you violate this Agreement, the authors reserve the right to revoke your access to the Model at any time.
I have read and fully understand this Disclaimer and User Agreement, and I accept full responsibility for any consequences arising from my use of the Model.
- Downloads last month
- 15