Instructions to use JaydeepR/SmolLM-135M-SFT-exp01 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- Unsloth Studio new
How to use JaydeepR/SmolLM-135M-SFT-exp01 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for JaydeepR/SmolLM-135M-SFT-exp01 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for JaydeepR/SmolLM-135M-SFT-exp01 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for JaydeepR/SmolLM-135M-SFT-exp01 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="JaydeepR/SmolLM-135M-SFT-exp01", max_seq_length=2048, )
File size: 9,079 Bytes
f741c93 4b11e6c f741c93 4b11e6c f741c93 4b11e6c f741c93 4b11e6c f741c93 00a102d f741c93 4b11e6c f741c93 00a102d 4b11e6c f741c93 00a102d 4b11e6c 00a102d f741c93 4b11e6c 00a102d 4b11e6c 00a102d 4b11e6c 00a102d 4b11e6c 00a102d 4b11e6c 00a102d 4b11e6c 00a102d 4b11e6c 00a102d 4b11e6c 00a102d 4b11e6c 00a102d 4b11e6c 00a102d 4b11e6c 00a102d 4b11e6c 00a102d 4b11e6c 00a102d 4b11e6c 00a102d 4b11e6c 00a102d 4b11e6c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 | ---
language:
- en
license: apache-2.0
base_model: paperbd/smollm_135M_arxiv_cpt
tags:
- sft
- instruction-tuning
- lora
- unsloth
- scientific
- arxiv
- nlp
- paper-researcher
datasets:
- paperbd/paper_instructions_300K-v1
---
# SmolLM-135M-SFT-exp01
Supervised fine-tuning of [SmolLM-135M-CPT-LoRA-r32](https://huggingface.co/JaydeepR/SmolLM-135M-CPT-LoRA-r32) on 300K synthetic ML paper instruction pairs. The result is a structured research assistant API for ML papers β not a general chatbot.
This is **exp01** in a series of SFT experiments on top of the CPT-adapted SmolLM-135M.
---
## Full Pipeline
```
arXiv ML papers (188)
β
βΌ
text-albumentations
(chunking + constrained synthetic generation)
β
βΌ
paperbd/paper_instructions_300K-v1
(300K instruction-response pairs)
β
βΌ
SFT training (LoRA r=32, ChatML, train_on_responses_only)
β
βΌ
SmolLM-135M-SFT-exp01
β
βΌ
PaperResearcher API (10 structured tasks)
```
---
## Model Description
- **Base model:** `paperbd/smollm_135M_arxiv_cpt` β SmolLM-135M after continued pre-training on arXiv ML papers
- **Method:** Supervised Fine-Tuning (SFT) with LoRA + `train_on_responses_only`
- **Domain:** ML/arXiv paper research tasks
- **Design:** Restricted API β 10 fixed task types, not a general chatbot
---
## Data Generation Pipeline
The training dataset was built from raw arXiv ML papers using a synthetic data generation pipeline:
### 1. Chunking
Raw paper text is split into overlapping 500-word chunks (100-word overlap) to create manageable context windows for generation.
### 2. Augmentation with `text-albumentations`
Each chunk is passed through stochastic augmentation tasks. Each task runs with 25% probability per chunk, ensuring dataset diversity:
| Task | Description | Output type |
|---|---|---|
| `bullet_augmentation` | Extract key points as markdown bullets | `list[str]` |
| `qa_pair_augmentation` | Generate question-answer pairs | `list[QAPair]` |
| `rephrase_augmentation` | Elaborate and restate the passage | `str` |
| `continuation_augmentation` | Continue from a passage prefix | `str` |
| `triplet_augmentation` | Extract knowledge graph triplets | `list[Triplet]` |
| `retrieval_augmentation` | Cross-chunk: which passage answers a question | `RetrievalResult` |
| `comparison_augmentation` | Cross-chunk: compare two passages | `str` |
### 3. Constrained Decoding via Outlines
All generation during data prep uses **[Outlines](https://github.com/dottxt-ai/outlines)** for structured output β a constrained decoding library that guarantees the generator returns outputs matching a predefined schema (Pydantic model or regex). This ensures:
- QA pairs always have valid `question` / `answer` fields
- Triplets always follow `(subject, relation, object)` format
- Retrieval results always return a valid passage index
Default runtime: `mlx-community/Qwen3.5-4B-OptiQ-4bit` via MLX (Apple Silicon). Async and batch variants available for large-scale generation.
### 4. Dataset
The final dataset `paperbd/paper_instructions_300K-v1` contains **300K instruction-response pairs** across all task types, uploaded to HuggingFace for reuse.
---
## Training Details
| Parameter | Value |
|---|---|
| LoRA rank | 32 |
| LoRA alpha | 32 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Trainable params | ~9.7M / 144M (6.77%) |
| Quantization | 4-bit (QLoRA via Unsloth) |
| Batch size | 32 |
| Gradient accumulation | 4 (effective batch: 128) |
| Learning rate | 2e-4 (linear decay) |
| Warmup ratio | 0.03 |
| Epochs | 3 |
| Total steps | 11,355 |
| Sequence length | 2048 (packed) |
| Chat template | ChatML |
| Response-only training | Yes β loss on assistant turns only |
| Data variations | 2 (conversation extension) β ~600K effective examples |
| Hardware | NVIDIA RTX 4090 |
| Training time | ~10 hours |
---
## Evaluation
### Method
1000 samples drawn from the `paper_instructions_300K-v1` test split. The fine-tuned model generates responses, which are then scored by `grok-3-mini` as an LLM judge.
### Judge Prompt (4 dimensions, 1β5 scale)
- **Faithfulness** β Does the response contain only factually correct claims? Penalise hallucinations.
- **Answer Correctness** β How closely does the response match the ground truth semantically?
- **Relevance** β Does the response directly address what was asked, without padding or going off-topic?
- **Completeness** β Does the response cover the key points from the ground truth without omitting important details?
### Results
| Metric | Score (1β5) |
|---|---|
| Faithfulness | 2.70 |
| Answer Correctness | 1.98 |
| Relevance | **3.04** |
| Completeness | 1.85 |
| **Overall** | **2.39** |
**Interpretation:** Relevance is the strongest dimension β the model stays on topic. Answer correctness and completeness are limited by the 135M parameter count; the model understands task structure but struggles to recall and reproduce factual content precisely.
---
## PaperResearcher API
The model is designed to be used as a structured API, not a free-form chatbot. The `PaperResearcher` class exposes 10 typed methods, each using the exact instruction strings the model was trained on:
```python
from paper_researcher import PaperResearcher
researcher = PaperResearcher("JaydeepR/SmolLM-135M-SFT-exp01")
passage = "Attention mechanisms compute weighted sums of values..."
# Extract key points
bullets: list[str] = researcher.extract_bullets(passage)
# Generate Q&A pairs
pairs: list[QAPair] = researcher.generate_qa_pairs(passage)
# β [QAPair(question="What does attention compute?", answer="Weighted sums of values")]
# Extract knowledge graph triplets
triplets: list[Triplet] = researcher.extract_triplets(passage)
# β [Triplet(subject="attention", relation="computes", object="weighted sums")]
# Answer a question given a passage
answer: str = researcher.answer("What does attention compute?", passage)
# Rephrase and elaborate
rephrased: str = researcher.rephrase(passage)
# Continue a passage from its beginning
continuation: str = researcher.continue_from(passage[:200])
# Extract a single key fact
fact: str = researcher.extract_fact(passage)
# Generate a question from a passage
question: str = researcher.generate_question(passage)
# Compare two passages
comparison: str = researcher.compare(passage_a, passage_b)
# Retrieval: which passage answers the question?
result: RetrievalResult = researcher.find_relevant(question, [passage_a, passage_b])
# β RetrievalResult(index=0, reasoning="Passage 1 directly defines...")
```
### Return Types
| Method | Return Type | Description |
|---|---|---|
| `extract_bullets` | `list[str]` | Parsed bullet points |
| `generate_qa_pairs` | `list[QAPair]` | `.question` and `.answer` fields |
| `extract_triplets` | `list[Triplet]` | `.subject`, `.relation`, `.object` fields |
| `find_relevant` | `RetrievalResult` | `.index` (0-based), `.reasoning` |
| All others | `str` | Raw text response |
---
## Raw Inference
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
adapter_id = "JaydeepR/SmolLM-135M-SFT-exp01"
base_model_id = "paperbd/smollm_135M_arxiv_cpt"
tokenizer = AutoTokenizer.from_pretrained(adapter_id)
model = AutoModelForCausalLM.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(model, adapter_id)
messages = [
{"role": "system", "content": "You are an expert in AI and ML research. Your answers are concise and helpful."},
{"role": "user", "content": "Extract the important points from this passage as markdown bullet points.\n\nAttention mechanisms..."},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, repetition_penalty=1.1, no_repeat_ngram_size=4)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```
---
## Limitations
- 135M parameter model β limited factual recall and reasoning capacity
- Trained on synthetic data β instruction format matters; use the exact prompts from `tasks.py`
- Relevance strongest (3.04/5); correctness and completeness weak (< 2/5)
- Best suited for structured extraction (bullets, triplets, QA) over open-ended generation
- No comparison against uninstructed base model yet β exp02 planned
---
## Related Models
| Model | Description |
|---|---|
| [JaydeepR/SmolLM-135M-CPT-LoRA-r32](https://huggingface.co/JaydeepR/SmolLM-135M-CPT-LoRA-r32) | CPT base (this model's starting point) |
| [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M) | Original base model |
---
## Citation
```
@misc{smollm135m-sft-exp01,
author = {Jaydeep Raijada},
title = {SmolLM-135M SFT exp01 β Instruction Tuning on ML Paper Research Tasks},
year = {2026},
url = {https://huggingface.co/JaydeepR/SmolLM-135M-SFT-exp01}
}
```
|