Marathlish-MiniLM
This is a Marathlish (Marathi-English code-mixed) sentence embedding model. It maps "Marathlish" queries (e.g., "Bhau paise pathvayche") to their English semantic equivalents (e.g., "Money Transfer").
It is designed for Indian developers building Search Engines, Chatbots, or Recommendation Systems for India.
π Quick Start
First, install the library:
pip install sentence-transformers
1. Basic Usage (The "Hello World")
This shows how the model converts text into numbers (vectors).
from sentence_transformers import SentenceTransformer
# Load the model
model = SentenceTransformer('anuragwagh0/marathlish-minilm')
# Encode sentences
sentences = ["Mala loan hava ahe", "I want a loan"]
embeddings = model.encode(sentences)
print(embeddings.shape)
# Output: (2, 384) -> Two sentences, each is a vector of size 384
2. Real World Example: Building a Semantic Router
This is how you use the model to "route" user queries to the right function in your app.
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer('anuragwagh0/marathlish-minilm')
# 1. Define your App's Capabilities (The "Targets")
app_actions = [
"Check Account Balance",
"Transfer Money",
"Call Customer Support"
]
# 2. Encode your actions (Do this once on startup)
action_vectors = model.encode(app_actions)
# 3. Simulate a User Query
user_query = "Bhau paise pathvayche" # Marathlish Input
query_vector = model.encode(user_query)
# 4. Find the best match
scores = util.cos_sim(query_vector, action_vectors)[0]
best_match_idx = scores.argmax()
best_action = app_actions[best_match_idx]
print(f"User said: '{user_query}'")
print(f"Bot Action: {best_action}")
# Output: Bot Action: Transfer Money
π Use Cases
- Customer Support Chatbots: Understand "Order kadhi yeil?" -> "Track Order".
- E-Commerce Search: Understand "Swast boot" -> "Cheap Shoes".
- Content Recommendation: Match Marathlish comments to English video tags.
π Performance
- Base Model:
all-MiniLM-L6-v2(Lightweight, ~80MB) - Language: Marathi + English (Code-Mixed / Marathlish)
- Dimensions: 384
π¨βπ» Training
Fine-tuned using sentence-transformers on a synthetic dataset of 5000+ banking, tech support, and casual conversation pairs.
- Downloads last month
- 10
Model tree for anuragwagh0/marathlish-minilm
Base model
sentence-transformers/all-MiniLM-L6-v2