Instructions to use JaydeepR/SmolLM-135M-SFT-exp01 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- Unsloth Studio new
How to use JaydeepR/SmolLM-135M-SFT-exp01 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for JaydeepR/SmolLM-135M-SFT-exp01 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for JaydeepR/SmolLM-135M-SFT-exp01 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for JaydeepR/SmolLM-135M-SFT-exp01 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="JaydeepR/SmolLM-135M-SFT-exp01", max_seq_length=2048, )
| language: | |
| - en | |
| license: apache-2.0 | |
| base_model: paperbd/smollm_135M_arxiv_cpt | |
| tags: | |
| - sft | |
| - instruction-tuning | |
| - lora | |
| - unsloth | |
| - scientific | |
| - arxiv | |
| - nlp | |
| - paper-researcher | |
| datasets: | |
| - paperbd/paper_instructions_300K-v1 | |
| # SmolLM-135M-SFT-exp01 | |
| Supervised fine-tuning of [SmolLM-135M-CPT-LoRA-r32](https://huggingface.co/JaydeepR/SmolLM-135M-CPT-LoRA-r32) on 300K synthetic ML paper instruction pairs. The result is a structured research assistant API for ML papers β not a general chatbot. | |
| This is **exp01** in a series of SFT experiments on top of the CPT-adapted SmolLM-135M. | |
| --- | |
| ## Full Pipeline | |
| ``` | |
| arXiv ML papers (188) | |
| β | |
| βΌ | |
| text-albumentations | |
| (chunking + constrained synthetic generation) | |
| β | |
| βΌ | |
| paperbd/paper_instructions_300K-v1 | |
| (300K instruction-response pairs) | |
| β | |
| βΌ | |
| SFT training (LoRA r=32, ChatML, train_on_responses_only) | |
| β | |
| βΌ | |
| SmolLM-135M-SFT-exp01 | |
| β | |
| βΌ | |
| PaperResearcher API (10 structured tasks) | |
| ``` | |
| --- | |
| ## Model Description | |
| - **Base model:** `paperbd/smollm_135M_arxiv_cpt` β SmolLM-135M after continued pre-training on arXiv ML papers | |
| - **Method:** Supervised Fine-Tuning (SFT) with LoRA + `train_on_responses_only` | |
| - **Domain:** ML/arXiv paper research tasks | |
| - **Design:** Restricted API β 10 fixed task types, not a general chatbot | |
| --- | |
| ## Data Generation Pipeline | |
| The training dataset was built from raw arXiv ML papers using a synthetic data generation pipeline: | |
| ### 1. Chunking | |
| Raw paper text is split into overlapping 500-word chunks (100-word overlap) to create manageable context windows for generation. | |
| ### 2. Augmentation with `text-albumentations` | |
| Each chunk is passed through stochastic augmentation tasks. Each task runs with 25% probability per chunk, ensuring dataset diversity: | |
| | Task | Description | Output type | | |
| |---|---|---| | |
| | `bullet_augmentation` | Extract key points as markdown bullets | `list[str]` | | |
| | `qa_pair_augmentation` | Generate question-answer pairs | `list[QAPair]` | | |
| | `rephrase_augmentation` | Elaborate and restate the passage | `str` | | |
| | `continuation_augmentation` | Continue from a passage prefix | `str` | | |
| | `triplet_augmentation` | Extract knowledge graph triplets | `list[Triplet]` | | |
| | `retrieval_augmentation` | Cross-chunk: which passage answers a question | `RetrievalResult` | | |
| | `comparison_augmentation` | Cross-chunk: compare two passages | `str` | | |
| ### 3. Constrained Decoding via Outlines | |
| All generation during data prep uses **[Outlines](https://github.com/dottxt-ai/outlines)** for structured output β a constrained decoding library that guarantees the generator returns outputs matching a predefined schema (Pydantic model or regex). This ensures: | |
| - QA pairs always have valid `question` / `answer` fields | |
| - Triplets always follow `(subject, relation, object)` format | |
| - Retrieval results always return a valid passage index | |
| Default runtime: `mlx-community/Qwen3.5-4B-OptiQ-4bit` via MLX (Apple Silicon). Async and batch variants available for large-scale generation. | |
| ### 4. Dataset | |
| The final dataset `paperbd/paper_instructions_300K-v1` contains **300K instruction-response pairs** across all task types, uploaded to HuggingFace for reuse. | |
| --- | |
| ## Training Details | |
| | Parameter | Value | | |
| |---|---| | |
| | LoRA rank | 32 | | |
| | LoRA alpha | 32 | | |
| | Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | |
| | Trainable params | ~9.7M / 144M (6.77%) | | |
| | Quantization | 4-bit (QLoRA via Unsloth) | | |
| | Batch size | 32 | | |
| | Gradient accumulation | 4 (effective batch: 128) | | |
| | Learning rate | 2e-4 (linear decay) | | |
| | Warmup ratio | 0.03 | | |
| | Epochs | 3 | | |
| | Total steps | 11,355 | | |
| | Sequence length | 2048 (packed) | | |
| | Chat template | ChatML | | |
| | Response-only training | Yes β loss on assistant turns only | | |
| | Data variations | 2 (conversation extension) β ~600K effective examples | | |
| | Hardware | NVIDIA RTX 4090 | | |
| | Training time | ~10 hours | | |
| --- | |
| ## Evaluation | |
| ### Method | |
| 1000 samples drawn from the `paper_instructions_300K-v1` test split. The fine-tuned model generates responses, which are then scored by `grok-3-mini` as an LLM judge. | |
| ### Judge Prompt (4 dimensions, 1β5 scale) | |
| - **Faithfulness** β Does the response contain only factually correct claims? Penalise hallucinations. | |
| - **Answer Correctness** β How closely does the response match the ground truth semantically? | |
| - **Relevance** β Does the response directly address what was asked, without padding or going off-topic? | |
| - **Completeness** β Does the response cover the key points from the ground truth without omitting important details? | |
| ### Results | |
| | Metric | Score (1β5) | | |
| |---|---| | |
| | Faithfulness | 2.70 | | |
| | Answer Correctness | 1.98 | | |
| | Relevance | **3.04** | | |
| | Completeness | 1.85 | | |
| | **Overall** | **2.39** | | |
| **Interpretation:** Relevance is the strongest dimension β the model stays on topic. Answer correctness and completeness are limited by the 135M parameter count; the model understands task structure but struggles to recall and reproduce factual content precisely. | |
| --- | |
| ## PaperResearcher API | |
| The model is designed to be used as a structured API, not a free-form chatbot. The `PaperResearcher` class exposes 10 typed methods, each using the exact instruction strings the model was trained on: | |
| ```python | |
| from paper_researcher import PaperResearcher | |
| researcher = PaperResearcher("JaydeepR/SmolLM-135M-SFT-exp01") | |
| passage = "Attention mechanisms compute weighted sums of values..." | |
| # Extract key points | |
| bullets: list[str] = researcher.extract_bullets(passage) | |
| # Generate Q&A pairs | |
| pairs: list[QAPair] = researcher.generate_qa_pairs(passage) | |
| # β [QAPair(question="What does attention compute?", answer="Weighted sums of values")] | |
| # Extract knowledge graph triplets | |
| triplets: list[Triplet] = researcher.extract_triplets(passage) | |
| # β [Triplet(subject="attention", relation="computes", object="weighted sums")] | |
| # Answer a question given a passage | |
| answer: str = researcher.answer("What does attention compute?", passage) | |
| # Rephrase and elaborate | |
| rephrased: str = researcher.rephrase(passage) | |
| # Continue a passage from its beginning | |
| continuation: str = researcher.continue_from(passage[:200]) | |
| # Extract a single key fact | |
| fact: str = researcher.extract_fact(passage) | |
| # Generate a question from a passage | |
| question: str = researcher.generate_question(passage) | |
| # Compare two passages | |
| comparison: str = researcher.compare(passage_a, passage_b) | |
| # Retrieval: which passage answers the question? | |
| result: RetrievalResult = researcher.find_relevant(question, [passage_a, passage_b]) | |
| # β RetrievalResult(index=0, reasoning="Passage 1 directly defines...") | |
| ``` | |
| ### Return Types | |
| | Method | Return Type | Description | | |
| |---|---|---| | |
| | `extract_bullets` | `list[str]` | Parsed bullet points | | |
| | `generate_qa_pairs` | `list[QAPair]` | `.question` and `.answer` fields | | |
| | `extract_triplets` | `list[Triplet]` | `.subject`, `.relation`, `.object` fields | | |
| | `find_relevant` | `RetrievalResult` | `.index` (0-based), `.reasoning` | | |
| | All others | `str` | Raw text response | | |
| --- | |
| ## Raw Inference | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| from peft import PeftModel | |
| adapter_id = "JaydeepR/SmolLM-135M-SFT-exp01" | |
| base_model_id = "paperbd/smollm_135M_arxiv_cpt" | |
| tokenizer = AutoTokenizer.from_pretrained(adapter_id) | |
| model = AutoModelForCausalLM.from_pretrained(base_model_id) | |
| model = PeftModel.from_pretrained(model, adapter_id) | |
| messages = [ | |
| {"role": "system", "content": "You are an expert in AI and ML research. Your answers are concise and helpful."}, | |
| {"role": "user", "content": "Extract the important points from this passage as markdown bullet points.\n\nAttention mechanisms..."}, | |
| ] | |
| prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = tokenizer(prompt, return_tensors="pt") | |
| outputs = model.generate(**inputs, max_new_tokens=256, repetition_penalty=1.1, no_repeat_ngram_size=4) | |
| print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)) | |
| ``` | |
| --- | |
| ## Limitations | |
| - 135M parameter model β limited factual recall and reasoning capacity | |
| - Trained on synthetic data β instruction format matters; use the exact prompts from `tasks.py` | |
| - Relevance strongest (3.04/5); correctness and completeness weak (< 2/5) | |
| - Best suited for structured extraction (bullets, triplets, QA) over open-ended generation | |
| - No comparison against uninstructed base model yet β exp02 planned | |
| --- | |
| ## Related Models | |
| | Model | Description | | |
| |---|---| | |
| | [JaydeepR/SmolLM-135M-CPT-LoRA-r32](https://huggingface.co/JaydeepR/SmolLM-135M-CPT-LoRA-r32) | CPT base (this model's starting point) | | |
| | [HuggingFaceTB/SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M) | Original base model | | |
| --- | |
| ## Citation | |
| ``` | |
| @misc{smollm135m-sft-exp01, | |
| author = {Jaydeep Raijada}, | |
| title = {SmolLM-135M SFT exp01 β Instruction Tuning on ML Paper Research Tasks}, | |
| year = {2026}, | |
| url = {https://huggingface.co/JaydeepR/SmolLM-135M-SFT-exp01} | |
| } | |
| ``` | |