Sentence Similarity
sentence-transformers
Safetensors
English
qwen3
feature-extraction
skill-retrieval
embedding
Eval Results (legacy)
text-embeddings-inference
Instructions to use anonymous-ed-benchmark/SKILLRET-Embedding-0.6B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use anonymous-ed-benchmark/SKILLRET-Embedding-0.6B with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("anonymous-ed-benchmark/SKILLRET-Embedding-0.6B") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
File size: 3,953 Bytes
44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c 44a278e a57ae5c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | ---
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- skill-retrieval
- embedding
language:
- en
datasets:
- anonymous-ed-benchmark/skillret-benchmark
library_name: sentence-transformers
pipeline_tag: sentence-similarity
model-index:
- name: SkillRet-Embedding-0.6B
results:
- task:
type: information-retrieval
name: Skill Retrieval
dataset:
type: anonymous-ed-benchmark/skillret-benchmark
name: SkillRet Benchmark (test)
split: test
metrics:
- type: ndcg_at_5
value: 0.753
name: NDCG@5
- type: ndcg_at_10
value: 0.777
name: NDCG@10
- type: recall_at_10
value: 0.852
name: Recall@10
- type: mrr_at_10
value: 0.827
name: MRR@10
---
# SkillRet-Embedding-0.6B
This is a [sentence-transformers](https://www.SBERT.net) model fine-tuned for **AI agent skill retrieval**. Given a natural-language user request, the model retrieves relevant agent skills from a large skill library.
The model is fine-tuned from [Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) on the [SkillRet benchmark](https://huggingface.co/datasets/anonymous-ed-benchmark/skillret-benchmark) training split using contrastive learning (MultipleNegativesRankingLoss).
## Usage
### Sentence Transformers
```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("anonymous-ed-benchmark/SkillRet-Embedding-0.6B", trust_remote_code=True)
query_prompt = "Instruct: Given a skill search query, retrieve relevant skills that match the query\nQuery: "
queries = [
query_prompt + "Help me set up a CI/CD pipeline for my Python project"
]
skills = [
"ci-cd-setup | Configure continuous integration and deployment pipelines ...",
"python-debugging | Debug Python applications using pdb and logging ...",
]
q_emb = model.encode(queries, normalize_embeddings=True)
s_emb = model.encode(skills, normalize_embeddings=True)
similarities = q_emb @ s_emb.T
print(similarities)
```
## Training Details
- **Base model**: Qwen3-Embedding-0.6B (0.6B parameters)
- **Training data**: SkillRet benchmark training split (127,190 query–skill pairs from 63,259 queries and 10,123 skills)
- **Loss**: MultipleNegativesRankingLoss (InfoNCE) with cross-GPU negative sharing
- **Hardware**: 4× NVIDIA B200 GPUs (DDP)
- **Effective batch size**: 384 (96 per device × 4 GPUs)
- **Max sequence length**: 8,192 tokens
- **Learning rate**: 2e-5
- **Epochs**: 1
- **Training time**: ~6 hours
- **Precision**: BF16
### Training Logs
| Epoch | Step | Training Loss | NDCG@15 |
|:-----:|:----:|:------------:|:-------:|
| 0.15 | 50 | 2.4288 | 0.7802 |
| 0.30 | 100 | 1.9920 | 0.7842 |
| **0.45** | **150** | **1.9758** | **0.7887** |
| 0.60 | 200 | 1.9011 | 0.7865 |
| 0.76 | 250 | 1.9100 | 0.7874 |
| 0.91 | 300 | 1.9412 | 0.7859 |
| 1.0 | 331 | - | 0.7862 |
Best checkpoint at step 150 (bold row).
## Evaluation Results
Evaluated on the SkillRet benchmark test split (4,997 queries, 6,660 skills).
| Metric | @5 | @10 | @15 |
|--------|------|------|------|
| NDCG | 0.753 | 0.777 | 0.786 |
| Recall | 0.791 | 0.852 | 0.880 |
| MRR | 0.823 | 0.827 | 0.828 |
| MAP | 0.698 | 0.713 | 0.718 |
| Precision | 0.253 | 0.138 | 0.096 |
| Accuracy@1 | 0.763 | — | — |
## Intended Use
This model is designed for retrieving agent skills given natural-language user requests. It is part of the SkillRet benchmark submission for evaluating skill retrieval systems for AI agents.
## Limitations
- Optimized for English-language queries and agent skills.
- Performance may vary on domains outside the SkillRet benchmark distribution.
- The model retrieves skills but does not execute them.
## Framework Versions
- Python: 3.10.12
- Sentence Transformers: 5.4.1
- Transformers: 5.5.4
- PyTorch: 2.7.1+cu128
## Citation
Citation information will be added in the de-anonymized release.
|