Instructions to use JaydeepR/SmolLM-135M-SFT-exp01 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- Unsloth Studio new
How to use JaydeepR/SmolLM-135M-SFT-exp01 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for JaydeepR/SmolLM-135M-SFT-exp01 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for JaydeepR/SmolLM-135M-SFT-exp01 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for JaydeepR/SmolLM-135M-SFT-exp01 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="JaydeepR/SmolLM-135M-SFT-exp01", max_seq_length=2048, )
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -18,16 +18,78 @@ datasets:
|
|
| 18 |
|
| 19 |
# SmolLM-135M-SFT-exp01
|
| 20 |
|
| 21 |
-
Supervised fine-tuning of [SmolLM-135M-CPT-LoRA-r32](https://huggingface.co/JaydeepR/SmolLM-135M-CPT-LoRA-r32) on 300K synthetic ML paper instruction pairs.
|
| 22 |
|
| 23 |
This is **exp01** in a series of SFT experiments on top of the CPT-adapted SmolLM-135M.
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
## Model Description
|
| 26 |
|
| 27 |
-
- **Base model:** `paperbd/smollm_135M_arxiv_cpt`
|
| 28 |
- **Method:** Supervised Fine-Tuning (SFT) with LoRA + `train_on_responses_only`
|
| 29 |
- **Domain:** ML/arXiv paper research tasks
|
| 30 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
## Training Details
|
| 33 |
|
|
@@ -46,46 +108,95 @@ This is **exp01** in a series of SFT experiments on top of the CPT-adapted SmolL
|
|
| 46 |
| Total steps | 11,355 |
|
| 47 |
| Sequence length | 2048 (packed) |
|
| 48 |
| Chat template | ChatML |
|
|
|
|
|
|
|
| 49 |
| Hardware | NVIDIA RTX 4090 |
|
| 50 |
| Training time | ~10 hours |
|
| 51 |
|
| 52 |
-
|
|
|
|
|
|
|
| 53 |
|
| 54 |
-
|
| 55 |
-
-
|
| 56 |
-
- **Train/val split:** 98% / 2%
|
| 57 |
-
- **Response-only training:** Loss computed only on assistant turns, not user prompts
|
| 58 |
|
| 59 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
-
|
| 62 |
|
| 63 |
-
| Metric | Score (1
|
| 64 |
|---|---|
|
| 65 |
| Faithfulness | 2.70 |
|
| 66 |
| Answer Correctness | 1.98 |
|
| 67 |
-
| Relevance | 3.04 |
|
| 68 |
| Completeness | 1.85 |
|
| 69 |
| **Overall** | **2.39** |
|
| 70 |
|
| 71 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
|
| 73 |
-
|
| 74 |
|
| 75 |
```python
|
| 76 |
from paper_researcher import PaperResearcher
|
| 77 |
|
| 78 |
researcher = PaperResearcher("JaydeepR/SmolLM-135M-SFT-exp01")
|
| 79 |
-
|
| 80 |
passage = "Attention mechanisms compute weighted sums of values..."
|
| 81 |
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
```
|
| 87 |
|
| 88 |
-
###
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
|
| 90 |
```python
|
| 91 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
@@ -99,36 +210,35 @@ model = AutoModelForCausalLM.from_pretrained(base_model_id)
|
|
| 99 |
model = PeftModel.from_pretrained(model, adapter_id)
|
| 100 |
|
| 101 |
messages = [
|
| 102 |
-
{"role": "system", "content": "You are an expert in AI and ML research."},
|
| 103 |
-
{"role": "user", "content": "Extract the
|
| 104 |
]
|
| 105 |
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 106 |
inputs = tokenizer(prompt, return_tensors="pt")
|
| 107 |
-
outputs = model.generate(**inputs, max_new_tokens=256, repetition_penalty=1.1)
|
| 108 |
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
|
| 109 |
```
|
| 110 |
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
| Task | Method |
|
| 114 |
-
|---|---|
|
| 115 |
-
| Extract bullet points | `researcher.extract_bullets(passage)` |
|
| 116 |
-
| Generate Q&A pairs | `researcher.generate_qa_pairs(passage)` |
|
| 117 |
-
| Generate a question | `researcher.generate_question(passage)` |
|
| 118 |
-
| Extract a fact | `researcher.extract_fact(passage)` |
|
| 119 |
-
| Answer a question | `researcher.answer(question, passage)` |
|
| 120 |
-
| Rephrase passage | `researcher.rephrase(passage)` |
|
| 121 |
-
| Continue passage | `researcher.continue_from(passage_start)` |
|
| 122 |
-
| Extract knowledge graph | `researcher.extract_triplets(passage)` |
|
| 123 |
-
| Compare two passages | `researcher.compare(passage_a, passage_b)` |
|
| 124 |
-
| Retrieval | `researcher.find_relevant(question, passages)` |
|
| 125 |
|
| 126 |
## Limitations
|
| 127 |
|
| 128 |
-
- 135M parameter model β limited factual recall and reasoning
|
| 129 |
-
- Trained on synthetic data β
|
| 130 |
-
- Relevance
|
| 131 |
-
- Best
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
|
| 133 |
## Citation
|
| 134 |
|
|
|
|
| 18 |
|
| 19 |
# SmolLM-135M-SFT-exp01
|
| 20 |
|
| 21 |
+
Supervised fine-tuning of [SmolLM-135M-CPT-LoRA-r32](https://huggingface.co/JaydeepR/SmolLM-135M-CPT-LoRA-r32) on 300K synthetic ML paper instruction pairs. The result is a structured research assistant API for ML papers β not a general chatbot.
|
| 22 |
|
| 23 |
This is **exp01** in a series of SFT experiments on top of the CPT-adapted SmolLM-135M.
|
| 24 |
|
| 25 |
+
---
|
| 26 |
+
|
| 27 |
+
## Full Pipeline
|
| 28 |
+
|
| 29 |
+
```
|
| 30 |
+
arXiv ML papers (188)
|
| 31 |
+
β
|
| 32 |
+
βΌ
|
| 33 |
+
text-albumentations
|
| 34 |
+
(chunking + constrained synthetic generation)
|
| 35 |
+
β
|
| 36 |
+
βΌ
|
| 37 |
+
paperbd/paper_instructions_300K-v1
|
| 38 |
+
(300K instruction-response pairs)
|
| 39 |
+
β
|
| 40 |
+
βΌ
|
| 41 |
+
SFT training (LoRA r=32, ChatML, train_on_responses_only)
|
| 42 |
+
β
|
| 43 |
+
βΌ
|
| 44 |
+
SmolLM-135M-SFT-exp01
|
| 45 |
+
β
|
| 46 |
+
βΌ
|
| 47 |
+
PaperResearcher API (10 structured tasks)
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
---
|
| 51 |
+
|
| 52 |
## Model Description
|
| 53 |
|
| 54 |
+
- **Base model:** `paperbd/smollm_135M_arxiv_cpt` β SmolLM-135M after continued pre-training on arXiv ML papers
|
| 55 |
- **Method:** Supervised Fine-Tuning (SFT) with LoRA + `train_on_responses_only`
|
| 56 |
- **Domain:** ML/arXiv paper research tasks
|
| 57 |
+
- **Design:** Restricted API β 10 fixed task types, not a general chatbot
|
| 58 |
+
|
| 59 |
+
---
|
| 60 |
+
|
| 61 |
+
## Data Generation Pipeline
|
| 62 |
+
|
| 63 |
+
The training dataset was built from raw arXiv ML papers using a synthetic data generation pipeline:
|
| 64 |
+
|
| 65 |
+
### 1. Chunking
|
| 66 |
+
Raw paper text is split into overlapping 500-word chunks (100-word overlap) to create manageable context windows for generation.
|
| 67 |
+
|
| 68 |
+
### 2. Augmentation with `text-albumentations`
|
| 69 |
+
Each chunk is passed through stochastic augmentation tasks. Each task runs with 25% probability per chunk, ensuring dataset diversity:
|
| 70 |
+
|
| 71 |
+
| Task | Description | Output type |
|
| 72 |
+
|---|---|---|
|
| 73 |
+
| `bullet_augmentation` | Extract key points as markdown bullets | `list[str]` |
|
| 74 |
+
| `qa_pair_augmentation` | Generate question-answer pairs | `list[QAPair]` |
|
| 75 |
+
| `rephrase_augmentation` | Elaborate and restate the passage | `str` |
|
| 76 |
+
| `continuation_augmentation` | Continue from a passage prefix | `str` |
|
| 77 |
+
| `triplet_augmentation` | Extract knowledge graph triplets | `list[Triplet]` |
|
| 78 |
+
| `retrieval_augmentation` | Cross-chunk: which passage answers a question | `RetrievalResult` |
|
| 79 |
+
| `comparison_augmentation` | Cross-chunk: compare two passages | `str` |
|
| 80 |
+
|
| 81 |
+
### 3. Constrained Decoding via Outlines
|
| 82 |
+
All generation during data prep uses **[Outlines](https://github.com/dottxt-ai/outlines)** for structured output β a constrained decoding library that guarantees the generator returns outputs matching a predefined schema (Pydantic model or regex). This ensures:
|
| 83 |
+
- QA pairs always have valid `question` / `answer` fields
|
| 84 |
+
- Triplets always follow `(subject, relation, object)` format
|
| 85 |
+
- Retrieval results always return a valid passage index
|
| 86 |
+
|
| 87 |
+
Default runtime: `mlx-community/Qwen3.5-4B-OptiQ-4bit` via MLX (Apple Silicon). Async and batch variants available for large-scale generation.
|
| 88 |
+
|
| 89 |
+
### 4. Dataset
|
| 90 |
+
The final dataset `paperbd/paper_instructions_300K-v1` contains **300K instruction-response pairs** across all task types, uploaded to HuggingFace for reuse.
|
| 91 |
+
|
| 92 |
+
---
|
| 93 |
|
| 94 |
## Training Details
|
| 95 |
|
|
|
|
| 108 |
| Total steps | 11,355 |
|
| 109 |
| Sequence length | 2048 (packed) |
|
| 110 |
| Chat template | ChatML |
|
| 111 |
+
| Response-only training | Yes β loss on assistant turns only |
|
| 112 |
+
| Data variations | 2 (conversation extension) β ~600K effective examples |
|
| 113 |
| Hardware | NVIDIA RTX 4090 |
|
| 114 |
| Training time | ~10 hours |
|
| 115 |
|
| 116 |
+
---
|
| 117 |
+
|
| 118 |
+
## Evaluation
|
| 119 |
|
| 120 |
+
### Method
|
| 121 |
+
1000 samples drawn from the `paper_instructions_300K-v1` test split. The fine-tuned model generates responses, which are then scored by `grok-3-mini` as an LLM judge.
|
|
|
|
|
|
|
| 122 |
|
| 123 |
+
### Judge Prompt (4 dimensions, 1β5 scale)
|
| 124 |
+
- **Faithfulness** β Does the response contain only factually correct claims? Penalise hallucinations.
|
| 125 |
+
- **Answer Correctness** β How closely does the response match the ground truth semantically?
|
| 126 |
+
- **Relevance** β Does the response directly address what was asked, without padding or going off-topic?
|
| 127 |
+
- **Completeness** β Does the response cover the key points from the ground truth without omitting important details?
|
| 128 |
|
| 129 |
+
### Results
|
| 130 |
|
| 131 |
+
| Metric | Score (1β5) |
|
| 132 |
|---|---|
|
| 133 |
| Faithfulness | 2.70 |
|
| 134 |
| Answer Correctness | 1.98 |
|
| 135 |
+
| Relevance | **3.04** |
|
| 136 |
| Completeness | 1.85 |
|
| 137 |
| **Overall** | **2.39** |
|
| 138 |
|
| 139 |
+
**Interpretation:** Relevance is the strongest dimension β the model stays on topic. Answer correctness and completeness are limited by the 135M parameter count; the model understands task structure but struggles to recall and reproduce factual content precisely.
|
| 140 |
+
|
| 141 |
+
---
|
| 142 |
+
|
| 143 |
+
## PaperResearcher API
|
| 144 |
|
| 145 |
+
The model is designed to be used as a structured API, not a free-form chatbot. The `PaperResearcher` class exposes 10 typed methods, each using the exact instruction strings the model was trained on:
|
| 146 |
|
| 147 |
```python
|
| 148 |
from paper_researcher import PaperResearcher
|
| 149 |
|
| 150 |
researcher = PaperResearcher("JaydeepR/SmolLM-135M-SFT-exp01")
|
|
|
|
| 151 |
passage = "Attention mechanisms compute weighted sums of values..."
|
| 152 |
|
| 153 |
+
# Extract key points
|
| 154 |
+
bullets: list[str] = researcher.extract_bullets(passage)
|
| 155 |
+
|
| 156 |
+
# Generate Q&A pairs
|
| 157 |
+
pairs: list[QAPair] = researcher.generate_qa_pairs(passage)
|
| 158 |
+
# β [QAPair(question="What does attention compute?", answer="Weighted sums of values")]
|
| 159 |
+
|
| 160 |
+
# Extract knowledge graph triplets
|
| 161 |
+
triplets: list[Triplet] = researcher.extract_triplets(passage)
|
| 162 |
+
# β [Triplet(subject="attention", relation="computes", object="weighted sums")]
|
| 163 |
+
|
| 164 |
+
# Answer a question given a passage
|
| 165 |
+
answer: str = researcher.answer("What does attention compute?", passage)
|
| 166 |
+
|
| 167 |
+
# Rephrase and elaborate
|
| 168 |
+
rephrased: str = researcher.rephrase(passage)
|
| 169 |
+
|
| 170 |
+
# Continue a passage from its beginning
|
| 171 |
+
continuation: str = researcher.continue_from(passage[:200])
|
| 172 |
+
|
| 173 |
+
# Extract a single key fact
|
| 174 |
+
fact: str = researcher.extract_fact(passage)
|
| 175 |
+
|
| 176 |
+
# Generate a question from a passage
|
| 177 |
+
question: str = researcher.generate_question(passage)
|
| 178 |
+
|
| 179 |
+
# Compare two passages
|
| 180 |
+
comparison: str = researcher.compare(passage_a, passage_b)
|
| 181 |
+
|
| 182 |
+
# Retrieval: which passage answers the question?
|
| 183 |
+
result: RetrievalResult = researcher.find_relevant(question, [passage_a, passage_b])
|
| 184 |
+
# β RetrievalResult(index=0, reasoning="Passage 1 directly defines...")
|
| 185 |
```
|
| 186 |
|
| 187 |
+
### Return Types
|
| 188 |
+
|
| 189 |
+
| Method | Return Type | Description |
|
| 190 |
+
|---|---|---|
|
| 191 |
+
| `extract_bullets` | `list[str]` | Parsed bullet points |
|
| 192 |
+
| `generate_qa_pairs` | `list[QAPair]` | `.question` and `.answer` fields |
|
| 193 |
+
| `extract_triplets` | `list[Triplet]` | `.subject`, `.relation`, `.object` fields |
|
| 194 |
+
| `find_relevant` | `RetrievalResult` | `.index` (0-based), `.reasoning` |
|
| 195 |
+
| All others | `str` | Raw text response |
|
| 196 |
+
|
| 197 |
+
---
|
| 198 |
+
|
| 199 |
+
## Raw Inference
|
| 200 |
|
| 201 |
```python
|
| 202 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
| 210 |
model = PeftModel.from_pretrained(model, adapter_id)
|
| 211 |
|
| 212 |
messages = [
|
| 213 |
+
{"role": "system", "content": "You are an expert in AI and ML research. Your answers are concise and helpful."},
|
| 214 |
+
{"role": "user", "content": "Extract the important points from this passage as markdown bullet points.\n\nAttention mechanisms..."},
|
| 215 |
]
|
| 216 |
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 217 |
inputs = tokenizer(prompt, return_tensors="pt")
|
| 218 |
+
outputs = model.generate(**inputs, max_new_tokens=256, repetition_penalty=1.1, no_repeat_ngram_size=4)
|
| 219 |
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
|
| 220 |
```
|
| 221 |
|
| 222 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 223 |
|
| 224 |
## Limitations
|
| 225 |
|
| 226 |
+
- 135M parameter model β limited factual recall and reasoning capacity
|
| 227 |
+
- Trained on synthetic data β instruction format matters; use the exact prompts from `tasks.py`
|
| 228 |
+
- Relevance strongest (3.04/5); correctness and completeness weak (< 2/5)
|
| 229 |
+
- Best suited for structured extraction (bullets, triplets, QA) over open-ended generation
|
| 230 |
+
- No comparison against uninstructed base model yet β exp02 planned
|
| 231 |
+
|
| 232 |
+
---
|
| 233 |
+
|
| 234 |
+
## Related Models
|
| 235 |
+
|
| 236 |
+
| Model | Description |
|
| 237 |
+
|---|---|
|
| 238 |
+
| [JaydeepR/SmolLM-135M-CPT-LoRA-r32](https://huggingface.co/JaydeepR/SmolLM-135M-CPT-LoRA-r32) | CPT base (this model's starting point) |
|
| 239 |
+
| [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M) | Original base model |
|
| 240 |
+
|
| 241 |
+
---
|
| 242 |
|
| 243 |
## Citation
|
| 244 |
|