Embedding Code snippets

#10
by noobmldude - opened

How would this perform for embedding code snippets or chunks containing code like Python,C etc ?

Hey noobmldude ! The training data is ~1% code, so it'll handle hybrid content (code + natural language, docstrings, etc.) but it's not optimized for pure code retrieval. If your use case is mostly code-to-code search, a code-specific model will likely do better. Worth benchmarking on your data to see.

Sign up or log in to comment