ntu-bge-small-zh-simcse-job-talent-matching

Fine-tuned BAAI/bge-small-zh-v1.5 using Supervised SimCSE for job-talent matching.

Training Details

  • Base model: BAAI/bge-small-zh-v1.5
  • Method: Supervised SimCSE (dual encoder + in-batch contrastive loss)
  • Task: Job-Talent matching (職缺-人才配對)
  • Temperature (τ): 0.05
  • Max length: 512
  • Batch size: 64
  • Optimizer: AdamW (lr=5e-5, wd=1e-2)
  • Early stopping: patience=3

Usage

from transformers import AutoTokenizer, AutoModel
import torch

tokenizer = AutoTokenizer.from_pretrained("yenstdi/ntu-bge-small-zh-simcse-job-talent-matching")
model = AutoModel.from_pretrained("yenstdi/ntu-bge-small-zh-simcse-job-talent-matching")

def encode(texts):
    inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs)
    mask = inputs["attention_mask"].unsqueeze(-1)
    embeddings = (outputs.last_hidden_state * mask).sum(1) / mask.sum(1).clamp(min=1e-6)
    return torch.nn.functional.normalize(embeddings, dim=-1)

job_emb = encode(["Software Engineer - Python, ML experience required"])
talent_emb = encode(["5 years Python developer with ML projects"])
score = (job_emb @ talent_emb.T).item()
print(f"Cosine similarity: {score:.4f}")

Trained by

National Taiwan University (NTU)

Downloads last month
19
Safetensors
Model size
24M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yenstdi/ntu-bge-small-zh-simcse-job-talent-matching

Finetuned
(6)
this model