Embedding Code snippets
#10
by noobmldude - opened
How would this perform for embedding code snippets or chunks containing code like Python,C etc ?
Hey noobmldude ! The training data is ~1% code, so it'll handle hybrid content (code + natural language, docstrings, etc.) but it's not optimized for pure code retrieval. If your use case is mostly code-to-code search, a code-specific model will likely do better. Worth benchmarking on your data to see.