ntu-bge-small-zh-simcse-job-talent-matching

Fine-tuned BAAI/bge-small-zh-v1.5 using Supervised SimCSE for job-talent matching.

Training Details

Base model: BAAI/bge-small-zh-v1.5
Method: Supervised SimCSE (dual encoder + in-batch contrastive loss)
Task: Job-Talent matching (職缺-人才配對)
Temperature (τ): 0.05
Max length: 512
Batch size: 64
Optimizer: AdamW (lr=5e-5, wd=1e-2)
Early stopping: patience=3

Usage

from transformers import AutoTokenizer, AutoModel
import torch

tokenizer = AutoTokenizer.from_pretrained("yenstdi/ntu-bge-small-zh-simcse-job-talent-matching")
model = AutoModel.from_pretrained("yenstdi/ntu-bge-small-zh-simcse-job-talent-matching")

def encode(texts):
    inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs)
    mask = inputs["attention_mask"].unsqueeze(-1)
    embeddings = (outputs.last_hidden_state * mask).sum(1) / mask.sum(1).clamp(min=1e-6)
    return torch.nn.functional.normalize(embeddings, dim=-1)

job_emb = encode(["Software Engineer - Python, ML experience required"])
talent_emb = encode(["5 years Python developer with ML projects"])
score = (job_emb @ talent_emb.T).item()
print(f"Cosine similarity: {score:.4f}")

Trained by

National Taiwan University (NTU)

Downloads last month: 19

Safetensors

Model size

24M params

Tensor type

F32

Model tree for yenstdi/ntu-bge-small-zh-simcse-job-talent-matching

Base model

BAAI/bge-small-zh-v1.5

Finetuned

(6)

this model