Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.01613

Open Source Long Context Text Embedders

nomic-ai/nomic-embed-text-v1

Sentence Similarity • 0.1B • Updated 12 days ago • 4.79M • 562
nomic-ai/nomic-embed-text-v1.5

Sentence Similarity • 0.1B • Updated 12 days ago • 11.9M • 799
nomic-ai/nomic-embed-text-v1-unsupervised

Sentence Similarity • Updated Aug 2, 2024 • 808 • 15
nomic-ai/nomic-embed-text-v1-ablated

Sentence Similarity • Updated Aug 2, 2024 • 155 • 4

eg: Text-to-images for mac

Nomic Embed: Training a Reproducible Long Context Text Embedder

Paper • 2402.01613 • Published Feb 2, 2024 • 17

Representation Learning

Text and Code Embeddings by Contrastive Pre-Training

Paper • 2201.10005 • Published Jan 24, 2022
Towards General Text Embeddings with Multi-stage Contrastive Learning

Paper • 2308.03281 • Published Aug 7, 2023 • 3
Nomic Embed: Training a Reproducible Long Context Text Embedder

Paper • 2402.01613 • Published Feb 2, 2024 • 17
Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training

Paper • 2405.06932 • Published May 11, 2024 • 20

TinyLlama: An Open-Source Small Language Model

Paper • 2401.02385 • Published Jan 4, 2024 • 95
MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24, 2024 • 47
SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26, 2024 • 73
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

Paper • 2401.16380 • Published Jan 29, 2024 • 53

Towards General Text Embeddings with Multi-stage Contrastive Learning

Paper • 2308.03281 • Published Aug 7, 2023 • 3
NEFTune: Noisy Embeddings Improve Instruction Finetuning

Paper • 2310.05914 • Published Oct 9, 2023 • 14
EELBERT: Tiny Models through Dynamic Embeddings

Paper • 2310.20144 • Published Oct 31, 2023 • 3
Dynamic Word Embeddings for Evolving Semantic Discovery

Paper • 1703.00607 • Published Mar 2, 2017 • 1

Text Embedding Papers

EmbeddingGemma: Powerful and Lightweight Text Representations

Paper • 2509.20354 • Published Sep 24, 2025 • 48
CoDiEmb: A Collaborative yet Distinct Framework for Unified Representation Learning in Information Retrieval and Semantic Textual Similarity

Paper • 2508.11442 • Published Aug 15, 2025 • 4
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Paper • 2506.05176 • Published Jun 5, 2025 • 81
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

Paper • 2402.03216 • Published Feb 5, 2024 • 7

long context embedding

Nomic Embed: Training a Reproducible Long Context Text Embedder

Paper • 2402.01613 • Published Feb 2, 2024 • 17

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 153
ReFT: Reasoning with Reinforced Fine-Tuning

Paper • 2401.08967 • Published Jan 17, 2024 • 32
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16, 2024 • 22
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 69

facebook/seamless-m4t-v2-large

Automatic Speech Recognition • 2B • Updated Jan 4, 2024 • 70k • 971
openai/whisper-large-v3

Automatic Speech Recognition • 2B • Updated Aug 12, 2024 • 4.86M • • 5.6k
Nomic Embed: Training a Reproducible Long Context Text Embedder

Paper • 2402.01613 • Published Feb 2, 2024 • 17

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Paper • 2310.05737 • Published Oct 9, 2023 • 6
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models

Paper • 2308.16692 • Published Aug 31, 2023 • 1
Towards General Text Embeddings with Multi-stage Contrastive Learning

Paper • 2308.03281 • Published Aug 7, 2023 • 3
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings

Paper • 2305.11554 • Published May 19, 2023 • 2

Open Source Long Context Text Embedders

nomic-ai/nomic-embed-text-v1

Sentence Similarity • 0.1B • Updated 12 days ago • 4.79M • 562
nomic-ai/nomic-embed-text-v1.5

Sentence Similarity • 0.1B • Updated 12 days ago • 11.9M • 799
nomic-ai/nomic-embed-text-v1-unsupervised

Sentence Similarity • Updated Aug 2, 2024 • 808 • 15
nomic-ai/nomic-embed-text-v1-ablated

Sentence Similarity • Updated Aug 2, 2024 • 155 • 4

Text Embedding Papers

EmbeddingGemma: Powerful and Lightweight Text Representations

Paper • 2509.20354 • Published Sep 24, 2025 • 48
CoDiEmb: A Collaborative yet Distinct Framework for Unified Representation Learning in Information Retrieval and Semantic Textual Similarity

Paper • 2508.11442 • Published Aug 15, 2025 • 4
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Paper • 2506.05176 • Published Jun 5, 2025 • 81
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

Paper • 2402.03216 • Published Feb 5, 2024 • 7

eg: Text-to-images for mac

Nomic Embed: Training a Reproducible Long Context Text Embedder

Paper • 2402.01613 • Published Feb 2, 2024 • 17

long context embedding

Nomic Embed: Training a Reproducible Long Context Text Embedder

Paper • 2402.01613 • Published Feb 2, 2024 • 17

Representation Learning

Text and Code Embeddings by Contrastive Pre-Training

Paper • 2201.10005 • Published Jan 24, 2022
Towards General Text Embeddings with Multi-stage Contrastive Learning

Paper • 2308.03281 • Published Aug 7, 2023 • 3
Nomic Embed: Training a Reproducible Long Context Text Embedder

Paper • 2402.01613 • Published Feb 2, 2024 • 17
Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training

Paper • 2405.06932 • Published May 11, 2024 • 20

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 153
ReFT: Reasoning with Reinforced Fine-Tuning

Paper • 2401.08967 • Published Jan 17, 2024 • 32
Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16, 2024 • 22
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 69

TinyLlama: An Open-Source Small Language Model

Paper • 2401.02385 • Published Jan 4, 2024 • 95
MM-LLMs: Recent Advances in MultiModal Large Language Models

Paper • 2401.13601 • Published Jan 24, 2024 • 47
SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26, 2024 • 73
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

Paper • 2401.16380 • Published Jan 29, 2024 • 53

facebook/seamless-m4t-v2-large

Automatic Speech Recognition • 2B • Updated Jan 4, 2024 • 70k • 971
openai/whisper-large-v3

Automatic Speech Recognition • 2B • Updated Aug 12, 2024 • 4.86M • • 5.6k
Nomic Embed: Training a Reproducible Long Context Text Embedder

Paper • 2402.01613 • Published Feb 2, 2024 • 17

Towards General Text Embeddings with Multi-stage Contrastive Learning

Paper • 2308.03281 • Published Aug 7, 2023 • 3
NEFTune: Noisy Embeddings Improve Instruction Finetuning

Paper • 2310.05914 • Published Oct 9, 2023 • 14
EELBERT: Tiny Models through Dynamic Embeddings

Paper • 2310.20144 • Published Oct 31, 2023 • 3
Dynamic Word Embeddings for Evolving Semantic Discovery

Paper • 1703.00607 • Published Mar 2, 2017 • 1

Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

Paper • 2310.05737 • Published Oct 9, 2023 • 6
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models

Paper • 2308.16692 • Published Aug 31, 2023 • 1
Towards General Text Embeddings with Multi-stage Contrastive Learning

Paper • 2308.03281 • Published Aug 7, 2023 • 3
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings

Paper • 2305.11554 • Published May 19, 2023 • 2

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs