File size: 3,953 Bytes

44a278e
a57ae5c
44a278e
 
 
 
a57ae5c
 
 
 
 
 
44a278e
a57ae5c
44a278e
a57ae5c
44a278e
 
 
a57ae5c
44a278e
a57ae5c
 
 
44a278e
a57ae5c
 
 
 
 
 
 
 
 
 
 
 
44a278e
 
a57ae5c
44a278e
a57ae5c
44a278e
a57ae5c
44a278e
 
 
a57ae5c
44a278e
 
 
 
a57ae5c
 
 
 
44a278e
a57ae5c
44a278e
a57ae5c
 
 
44a278e
 
a57ae5c
 
 
 
44a278e
 
 
 
 
a57ae5c
 
 
 
 
 
 
 
 
 
44a278e
a57ae5c
44a278e
a57ae5c
 
 
 
 
 
 
 
 
44a278e
a57ae5c
44a278e
a57ae5c
44a278e
a57ae5c
44a278e
a57ae5c
 
 
 
 
 
 
 
44a278e
a57ae5c
44a278e
a57ae5c
44a278e
a57ae5c
44a278e
a57ae5c
 
 
44a278e
a57ae5c
44a278e
 
 
 
 
 
 
 
a57ae5c

---
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- skill-retrieval
- embedding
language:
- en
datasets:
- anonymous-ed-benchmark/skillret-benchmark
library_name: sentence-transformers
pipeline_tag: sentence-similarity
model-index:
- name: SkillRet-Embedding-0.6B
  results:
  - task:
      type: information-retrieval
      name: Skill Retrieval
    dataset:
      type: anonymous-ed-benchmark/skillret-benchmark
      name: SkillRet Benchmark (test)
      split: test
    metrics:
    - type: ndcg_at_5
      value: 0.753
      name: NDCG@5
    - type: ndcg_at_10
      value: 0.777
      name: NDCG@10
    - type: recall_at_10
      value: 0.852
      name: Recall@10
    - type: mrr_at_10
      value: 0.827
      name: MRR@10
---

# SkillRet-Embedding-0.6B

This is a [sentence-transformers](https://www.SBERT.net) model fine-tuned for **AI agent skill retrieval**. Given a natural-language user request, the model retrieves relevant agent skills from a large skill library.

The model is fine-tuned from [Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) on the [SkillRet benchmark](https://huggingface.co/datasets/anonymous-ed-benchmark/skillret-benchmark) training split using contrastive learning (MultipleNegativesRankingLoss).

## Usage

### Sentence Transformers

```python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("anonymous-ed-benchmark/SkillRet-Embedding-0.6B", trust_remote_code=True)

query_prompt = "Instruct: Given a skill search query, retrieve relevant skills that match the query\nQuery: "

queries = [
    query_prompt + "Help me set up a CI/CD pipeline for my Python project"
]
skills = [
    "ci-cd-setup | Configure continuous integration and deployment pipelines ...",
    "python-debugging | Debug Python applications using pdb and logging ...",
]

q_emb = model.encode(queries, normalize_embeddings=True)
s_emb = model.encode(skills, normalize_embeddings=True)

similarities = q_emb @ s_emb.T
print(similarities)
```

## Training Details

- **Base model**: Qwen3-Embedding-0.6B (0.6B parameters)
- **Training data**: SkillRet benchmark training split (127,190 query–skill pairs from 63,259 queries and 10,123 skills)
- **Loss**: MultipleNegativesRankingLoss (InfoNCE) with cross-GPU negative sharing
- **Hardware**: 4× NVIDIA B200 GPUs (DDP)
- **Effective batch size**: 384 (96 per device × 4 GPUs)
- **Max sequence length**: 8,192 tokens
- **Learning rate**: 2e-5
- **Epochs**: 1
- **Training time**: ~6 hours
- **Precision**: BF16

### Training Logs

| Epoch | Step | Training Loss | NDCG@15 |
|:-----:|:----:|:------------:|:-------:|
| 0.15 | 50 | 2.4288 | 0.7802 |
| 0.30 | 100 | 1.9920 | 0.7842 |
| **0.45** | **150** | **1.9758** | **0.7887** |
| 0.60 | 200 | 1.9011 | 0.7865 |
| 0.76 | 250 | 1.9100 | 0.7874 |
| 0.91 | 300 | 1.9412 | 0.7859 |
| 1.0 | 331 | - | 0.7862 |

Best checkpoint at step 150 (bold row).

## Evaluation Results

Evaluated on the SkillRet benchmark test split (4,997 queries, 6,660 skills).

| Metric | @5 | @10 | @15 |
|--------|------|------|------|
| NDCG | 0.753 | 0.777 | 0.786 |
| Recall | 0.791 | 0.852 | 0.880 |
| MRR | 0.823 | 0.827 | 0.828 |
| MAP | 0.698 | 0.713 | 0.718 |
| Precision | 0.253 | 0.138 | 0.096 |
| Accuracy@1 | 0.763 | — | — |

## Intended Use

This model is designed for retrieving agent skills given natural-language user requests. It is part of the SkillRet benchmark submission for evaluating skill retrieval systems for AI agents.

## Limitations

- Optimized for English-language queries and agent skills.
- Performance may vary on domains outside the SkillRet benchmark distribution.
- The model retrieves skills but does not execute them.

## Framework Versions

- Python: 3.10.12
- Sentence Transformers: 5.4.1
- Transformers: 5.5.4
- PyTorch: 2.7.1+cu128

## Citation

Citation information will be added in the de-anonymized release.