Add model card and metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +37 -0
README.md CHANGED
@@ -1,3 +1,40 @@
1
  ---
2
  license: apache-2.0
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: feature-extraction
4
  ---
5
+
6
+ # AutoSelection Sparse Autoencoder (SAE)
7
+
8
+ This repository contains a Sparse Autoencoder (SAE) checkpoint associated with the paper [From Instance Selection to Fixed-Pool Data Recipe Search for Supervised Fine-Tuning](https://huggingface.co/papers/2605.12944).
9
+
10
+ ## Model description
11
+
12
+ AutoSelection is a budgeted solver for fixed-pool data recipe search. Instead of treating SFT data selection as a one-shot instance ranking problem, it searches over executable data-curation recipes that filter, mix, deduplicate, and recombine samples.
13
+
14
+ This SAE is used within the AutoSelection framework to extract task-, data-, and model-side signals during cold-start scoring and subset-state construction.
15
+
16
+ **Configuration:**
17
+ - **Architecture:** Top-K SAE
18
+ - **Input Dimension (d_in):** 2048
19
+ - **Expansion Factor:** 32
20
+ - **K:** 192
21
+
22
+ ## Resources
23
+
24
+ - **Paper:** [From Instance Selection to Fixed-Pool Data Recipe Search for Supervised Fine-Tuning](https://huggingface.co/papers/2605.12944)
25
+ - **Repository:** [GitHub](https://github.com/w253/AutoSelection)
26
+ - **Training Pool:** [k253/AutoSelection-90k](https://huggingface.co/datasets/k253/AutoSelection-90k)
27
+
28
+ ## Citation
29
+
30
+ ```bibtex
31
+ @misc{wu2026instanceselectionfixedpooldata,
32
+ title={From Instance Selection to Fixed-Pool Data Recipe Search for Supervised Fine-Tuning},
33
+ author={Haodong Wu and Jiahao Zhang and Lijie Hu and Yongqi Zhang},
34
+ year={2026},
35
+ eprint={2605.12944},
36
+ archivePrefix={arXiv},
37
+ primaryClass={cs.LG},
38
+ url={https://arxiv.org/abs/2605.12944},
39
+ }
40
+ ```