Sentence Similarity
sentence-transformers
Safetensors
Portuguese
modernbert
feature-extraction
semantic-search
embeddings
portuguese
brazilian-portuguese
pt-br
b2b
fine-tuned
mteb
granite
autoresearch
Eval Results (legacy)
text-embeddings-inference
Instructions to use calneymgp/braza-embedding-ptbr-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use calneymgp/braza-embedding-ptbr-v1 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("calneymgp/braza-embedding-ptbr-v1") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
docs: add base_model field + tamz.ai backlinks
Browse files
README.md
CHANGED
|
@@ -3,6 +3,7 @@ language:
|
|
| 3 |
- pt
|
| 4 |
license: apache-2.0
|
| 5 |
library_name: sentence-transformers
|
|
|
|
| 6 |
tags:
|
| 7 |
- sentence-transformers
|
| 8 |
- feature-extraction
|
|
@@ -53,6 +54,8 @@ model-index:
|
|
| 53 |
|
| 54 |
Fine-tuned from [`ibm-granite/granite-embedding-97m-multilingual-r2`](https://huggingface.co/ibm-granite/granite-embedding-97m-multilingual-r2) using a **Karpathy-style autoresearch loop** β 36 autonomous training iterations on an RTX 5090, each proposing and self-validating its own strategy. **35 out of 36 iterations improved the model (97% acceptance rate).**
|
| 55 |
|
|
|
|
|
|
|
| 56 |
---
|
| 57 |
|
| 58 |
## MTEB Benchmark Results (PT-BR)
|
|
@@ -186,7 +189,7 @@ embeddings = model.encode(texts, truncate_dim=128)
|
|
| 186 |
## Best For
|
| 187 |
|
| 188 |
β
Semantic search over Brazilian business data
|
| 189 |
-
β
B2B lead discovery and company matching
|
| 190 |
β
Company similarity, clustering, deduplication
|
| 191 |
β
PT-BR RAG pipelines with business documents
|
| 192 |
β
Memory systems for Portuguese AI agents
|
|
@@ -222,6 +225,7 @@ Apache 2.0 β same as the base IBM Granite Embedding model.
|
|
| 222 |
publisher = {HuggingFace},
|
| 223 |
url = {https://huggingface.co/calneymgp/braza-embedding-ptbr-v1},
|
| 224 |
note = {Fine-tuned from IBM Granite 97M on 474K Brazilian B2B companies
|
| 225 |
-
using 36-iteration autonomous training loop (RTX 5090)
|
|
|
|
| 226 |
}
|
| 227 |
```
|
|
|
|
| 3 |
- pt
|
| 4 |
license: apache-2.0
|
| 5 |
library_name: sentence-transformers
|
| 6 |
+
base_model: ibm-granite/granite-embedding-97m-multilingual-r2
|
| 7 |
tags:
|
| 8 |
- sentence-transformers
|
| 9 |
- feature-extraction
|
|
|
|
| 54 |
|
| 55 |
Fine-tuned from [`ibm-granite/granite-embedding-97m-multilingual-r2`](https://huggingface.co/ibm-granite/granite-embedding-97m-multilingual-r2) using a **Karpathy-style autoresearch loop** β 36 autonomous training iterations on an RTX 5090, each proposing and self-validating its own strategy. **35 out of 36 iterations improved the model (97% acceptance rate).**
|
| 56 |
|
| 57 |
+
Built at [**TAMZ**](https://tamz.ai) β a Brazilian B2B sales intelligence platform that identifies, enriches, and delivers company leads ready for outreach. The training data comes directly from TAMZ's enrichment pipeline over 32M Brazilian companies from the Receita Federal.
|
| 58 |
+
|
| 59 |
---
|
| 60 |
|
| 61 |
## MTEB Benchmark Results (PT-BR)
|
|
|
|
| 189 |
## Best For
|
| 190 |
|
| 191 |
β
Semantic search over Brazilian business data
|
| 192 |
+
β
B2B lead discovery and company matching (e.g. [TAMZ](https://tamz.ai))
|
| 193 |
β
Company similarity, clustering, deduplication
|
| 194 |
β
PT-BR RAG pipelines with business documents
|
| 195 |
β
Memory systems for Portuguese AI agents
|
|
|
|
| 225 |
publisher = {HuggingFace},
|
| 226 |
url = {https://huggingface.co/calneymgp/braza-embedding-ptbr-v1},
|
| 227 |
note = {Fine-tuned from IBM Granite 97M on 474K Brazilian B2B companies
|
| 228 |
+
using 36-iteration autonomous training loop (RTX 5090).
|
| 229 |
+
Built at TAMZ (https://tamz.ai)}
|
| 230 |
}
|
| 231 |
```
|