| --- |
| license: apache-2.0 |
| tags: |
| - sentence-transformers |
| - sentence-similarity |
| - feature-extraction |
| - skill-retrieval |
| - embedding |
| language: |
| - en |
| datasets: |
| - anonymous-ed-benchmark/skillret-benchmark |
| library_name: sentence-transformers |
| pipeline_tag: sentence-similarity |
| model-index: |
| - name: SkillRet-Embedding-8B |
| results: |
| - task: |
| type: information-retrieval |
| name: Skill Retrieval |
| dataset: |
| type: anonymous-ed-benchmark/skillret-benchmark |
| name: SkillRet Benchmark (test) |
| split: test |
| metrics: |
| - type: ndcg_at_5 |
| value: 0.8123 |
| name: NDCG@5 |
| - type: ndcg_at_10 |
| value: 0.8345 |
| name: NDCG@10 |
| - type: recall_at_10 |
| value: 0.9123 |
| name: Recall@10 |
| --- |
| |
| # SkillRet-Embedding-8B |
|
|
| This is a [sentence-transformers](https://www.SBERT.net) model fine-tuned for **AI agent skill retrieval**. Given a natural-language user request, the model retrieves relevant agent skills from a large skill library. |
|
|
| The model is fine-tuned from [Qwen/Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B) on the [SkillRet benchmark](https://huggingface.co/datasets/anonymous-ed-benchmark/skillret-benchmark) training split using contrastive learning (MultipleNegativesRankingLoss). |
|
|
| ## Usage |
|
|
| ### Sentence Transformers |
|
|
| ```python |
| from sentence_transformers import SentenceTransformer |
| |
| model = SentenceTransformer("anonymous-ed-benchmark/SkillRet-Embedding-8B", trust_remote_code=True) |
| |
| query_prompt = "Instruct: Given a skill search query, retrieve relevant skills that match the query\nQuery: " |
| |
| queries = [ |
| query_prompt + "Help me set up a CI/CD pipeline for my Python project" |
| ] |
| skills = [ |
| "ci-cd-setup | Configure continuous integration and deployment pipelines ...", |
| "python-debugging | Debug Python applications using pdb and logging ...", |
| ] |
| |
| q_emb = model.encode(queries, normalize_embeddings=True) |
| s_emb = model.encode(skills, normalize_embeddings=True) |
| |
| similarities = q_emb @ s_emb.T |
| print(similarities) |
| ``` |
|
|
| ## Training Details |
|
|
| - **Base model**: Qwen3-Embedding-8B (8B parameters) |
| - **Training data**: SkillRet benchmark training split (127,190 query–skill pairs from 63,259 queries and 10,123 skills) |
| - **Loss**: MultipleNegativesRankingLoss (InfoNCE) with cross-GPU negative sharing |
| - **Hardware**: 4× NVIDIA B200 GPUs (DDP) |
| - **Effective batch size**: 80 (20 per device × 4 GPUs) |
| - **Max sequence length**: 8,192 tokens |
| - **Learning rate**: 2e-5 |
| - **Epochs**: 1 |
| - **Precision**: BF16 |
|
|
| ## Evaluation Results |
|
|
| Evaluated on the SkillRet benchmark test split (4,997 queries, 6,660 skills). |
|
|
| | Metric | @5 | @10 | @15 | |
| |--------|------|------|------| |
| | NDCG | 0.8123 | 0.8345 | 0.8418 | |
| | Recall | 0.8558 | 0.9123 | 0.9355 | |
| | Completeness | 0.7562 | 0.8463 | 0.8841 | |
|
|
| ## Intended Use |
|
|
| This model is designed for retrieving agent skills given natural-language user requests. It is part of the SkillRet benchmark submission for evaluating skill retrieval systems for AI agents. |
|
|
| ## Limitations |
|
|
| - Optimized for English-language queries and agent skills. |
| - Performance may vary on domains outside the SkillRet benchmark distribution. |
| - The model retrieves skills but does not execute them. |
|
|
| ## Framework Versions |
|
|
| - Python: 3.10.12 |
| - Sentence Transformers: 5.4.1 |
| - Transformers: 5.5.4 |
| - PyTorch: 2.7.1+cu128 |
|
|
| ## Citation |
|
|
| Citation information will be added in the de-anonymized release. |
|
|