JaydeepR commited on
Commit
bb41507
·
verified ·
1 Parent(s): f741c93

Upload HF_MODEL_CARD.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. HF_MODEL_CARD.md +142 -0
HF_MODEL_CARD.md ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ base_model: paperbd/smollm_135M_arxiv_cpt
6
+ tags:
7
+ - sft
8
+ - instruction-tuning
9
+ - lora
10
+ - unsloth
11
+ - scientific
12
+ - arxiv
13
+ - nlp
14
+ - paper-researcher
15
+ datasets:
16
+ - paperbd/paper_instructions_300K-v1
17
+ ---
18
+
19
+ # SmolLM-135M-SFT-exp01
20
+
21
+ Supervised fine-tuning of [SmolLM-135M-CPT-LoRA-r32](https://huggingface.co/JaydeepR/SmolLM-135M-CPT-LoRA-r32) on 300K synthetic ML paper instruction pairs.
22
+
23
+ This is **exp01** in a series of SFT experiments on top of the CPT-adapted SmolLM-135M.
24
+
25
+ ## Model Description
26
+
27
+ - **Base model:** `paperbd/smollm_135M_arxiv_cpt` (CPT-adapted SmolLM-135M, merged)
28
+ - **Method:** Supervised Fine-Tuning (SFT) with LoRA + `train_on_responses_only`
29
+ - **Domain:** ML/arXiv paper research tasks
30
+ - **Task:** Instruction following — bullets, QA, triplets, retrieval, comparison, etc.
31
+
32
+ ## Training Details
33
+
34
+ | Parameter | Value |
35
+ |---|---|
36
+ | LoRA rank | 32 |
37
+ | LoRA alpha | 32 |
38
+ | Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
39
+ | Trainable params | ~9.7M / 144M (6.77%) |
40
+ | Quantization | 4-bit (QLoRA via Unsloth) |
41
+ | Batch size | 32 |
42
+ | Gradient accumulation | 4 (effective batch: 128) |
43
+ | Learning rate | 2e-4 (linear decay) |
44
+ | Warmup ratio | 0.03 |
45
+ | Epochs | 3 |
46
+ | Total steps | 11,355 |
47
+ | Sequence length | 2048 (packed) |
48
+ | Chat template | ChatML |
49
+ | Hardware | NVIDIA RTX 4090 |
50
+ | Training time | ~10 hours |
51
+
52
+ ## Training Data
53
+
54
+ - **Dataset:** `paperbd/paper_instructions_300K-v1` — 300K synthetic instruction-response pairs generated from arXiv ML papers
55
+ - **Variations:** 2 (conversation extension) → ~600K effective training examples
56
+ - **Train/val split:** 98% / 2%
57
+ - **Response-only training:** Loss computed only on assistant turns, not user prompts
58
+
59
+ ## Evaluation Results
60
+
61
+ Evaluated on 1000 samples from the `paper_instructions_300K-v1` test split, judged by `grok-3-mini`:
62
+
63
+ | Metric | Score (1-5) |
64
+ |---|---|
65
+ | Faithfulness | 2.70 |
66
+ | Answer Correctness | 1.98 |
67
+ | Relevance | 3.04 |
68
+ | Completeness | 1.85 |
69
+ | **Overall** | **2.39** |
70
+
71
+ ## How to Use
72
+
73
+ ### As PaperResearcher API
74
+
75
+ ```python
76
+ from paper_researcher import PaperResearcher
77
+
78
+ researcher = PaperResearcher("JaydeepR/SmolLM-135M-SFT-exp01")
79
+
80
+ passage = "Attention mechanisms compute weighted sums of values..."
81
+
82
+ bullets = researcher.extract_bullets(passage)
83
+ qa_pairs = researcher.generate_qa_pairs(passage)
84
+ triplets = researcher.extract_triplets(passage)
85
+ answer = researcher.answer("What does attention compute?", passage)
86
+ ```
87
+
88
+ ### Raw inference
89
+
90
+ ```python
91
+ from transformers import AutoModelForCausalLM, AutoTokenizer
92
+ from peft import PeftModel
93
+
94
+ adapter_id = "JaydeepR/SmolLM-135M-SFT-exp01"
95
+ base_model_id = "paperbd/smollm_135M_arxiv_cpt"
96
+
97
+ tokenizer = AutoTokenizer.from_pretrained(adapter_id)
98
+ model = AutoModelForCausalLM.from_pretrained(base_model_id)
99
+ model = PeftModel.from_pretrained(model, adapter_id)
100
+
101
+ messages = [
102
+ {"role": "system", "content": "You are an expert in AI and ML research."},
103
+ {"role": "user", "content": "Extract the key points from this passage as bullet points.\n\nAttention mechanisms..."},
104
+ ]
105
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
106
+ inputs = tokenizer(prompt, return_tensors="pt")
107
+ outputs = model.generate(**inputs, max_new_tokens=256, repetition_penalty=1.1)
108
+ print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
109
+ ```
110
+
111
+ ## Supported Tasks
112
+
113
+ | Task | Method |
114
+ |---|---|
115
+ | Extract bullet points | `researcher.extract_bullets(passage)` |
116
+ | Generate Q&A pairs | `researcher.generate_qa_pairs(passage)` |
117
+ | Generate a question | `researcher.generate_question(passage)` |
118
+ | Extract a fact | `researcher.extract_fact(passage)` |
119
+ | Answer a question | `researcher.answer(question, passage)` |
120
+ | Rephrase passage | `researcher.rephrase(passage)` |
121
+ | Continue passage | `researcher.continue_from(passage_start)` |
122
+ | Extract knowledge graph | `researcher.extract_triplets(passage)` |
123
+ | Compare two passages | `researcher.compare(passage_a, passage_b)` |
124
+ | Retrieval | `researcher.find_relevant(question, passages)` |
125
+
126
+ ## Limitations
127
+
128
+ - 135M parameter model — limited factual recall and reasoning
129
+ - Trained on synthetic data — may not generalise to all instruction styles
130
+ - Relevance is the strongest dimension (3.04/5); correctness and completeness are weak (< 2/5)
131
+ - Best used for structured extraction tasks, not open-ended QA
132
+
133
+ ## Citation
134
+
135
+ ```
136
+ @misc{smollm135m-sft-exp01,
137
+ author = {Jaydeep Raijada},
138
+ title = {SmolLM-135M SFT exp01 — Instruction Tuning on ML Paper Research Tasks},
139
+ year = {2026},
140
+ url = {https://huggingface.co/JaydeepR/SmolLM-135M-SFT-exp01}
141
+ }
142
+ ```