Susant Achary
Susant-Achary
AI & ML interests
Tiny to Small Language Models, Building from India. Quantization and MLX
Recent Activity
reacted to Shrijanagain's post with ➕ 27 days ago
Surya-1.1T: Scaling Beyond Human-Level Reasoning via 146 Trillion Token Pre-training
Author: SKT AI LABS
Affiliation: SKT AI Labs / Project Surya
Model Architecture: Optimized Dense Transformer
Parameters: 1.1 Trillion
Training Tokens: 146 Trillion
Wanna collaborate us Friends let's Start Journey we have Collected 146 trillon tokens and done pre training but we need to made more powerfull
Whitepaper - https://github.com/SHRIJANAGAIN/PROFF reacted to Shrijanagain's post with 🔥 27 days ago
Surya-1.1T: Scaling Beyond Human-Level Reasoning via 146 Trillion Token Pre-training
Author: SKT AI LABS
Affiliation: SKT AI Labs / Project Surya
Model Architecture: Optimized Dense Transformer
Parameters: 1.1 Trillion
Training Tokens: 146 Trillion
Wanna collaborate us Friends let's Start Journey we have Collected 146 trillon tokens and done pre training but we need to made more powerfull
Whitepaper - https://github.com/SHRIJANAGAIN/PROFF liked a model 5 months ago
mlx-community/medgemma-27b-it-8bitOrganizations
<7B Best of MoE 🧠
Collection of Small size big impact MoE.
-
LiquidAI/LFM2-8B-A1B
Text Generation • 8B • Updated • 64.8k • 351 -
ibm-granite/granite-4.0-h-tiny
Text Generation • 7B • Updated • 81.9k • 198 -
microsoft/Phi-4-multimodal-instruct
Automatic Speech Recognition • 6B • Updated • 329k • 1.58k -
google/gemma-3n-E4B-it
Image-Text-to-Text • Updated • 39.5k • • 900
Audio Features
Feature Extraction with 🧠 Text Embeddings
models for turning text, images, audio (and combos) into useful vectors or feature maps. Ideal for search/RAG, clustering, recommendation, retrieval.
🪶 Sept’25 <Text Generation Language Models >(Top Releases)
coding models and pipelines released this month that boost repo-level reasoning, GUI automation, and tool use. Focused on practical editing.
🖼️ **Text2Image, i2i ** September ’25 (Top Releases)
Cutting-edge image generation & VLM updates from September ’25. This collection spotlights models that improved text rendering, layout control & more.
📄➡️🔊 Text-to-Speech (TTS)
Speech synthesis models that turn text into natural audio. Includes multilingual TTS, low-latency real-time models, and voice-cloning variants.
📚➡️🎨Text-to-Image
State-of-the-art diffusion and generative models that turn text prompts into detailed images. Includes lightweight CPU-friendly and photorealistic mdl
🎨➡️✍️ Image-to-Text
OCR, captioning, and visual QA models that turn pure images into descriptive or structured text.
-
Salesforce/blip-image-captioning-base
Image-to-Text • Updated • 2.22M • 847 -
Salesforce/blip-image-captioning-large
Image-to-Text • 0.5B • Updated • 1.41M • 1.47k -
nlpconnect/vit-gpt2-image-captioning
Image-to-Text • Updated • 214k • 927 -
microsoft/trocr-base-handwritten
Image-to-Text • 0.3B • Updated • 153k • 490
🌀 Any-to-Any Multimodal Models
Models that can flexibly convert across modalities (text, image, audio, video). Ideal for researchers exploring unified multimodal-AI.
👨💻Mathematical Reasoning 🧮
Datasets tackling AI Toughest Challenges
🧩 Long-Context Models (≥128k) CODING
10 CODING models that support ≥128k context (native or via officially documented scaling)
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.55M • • 5.7k -
google/gemma-3-4b-it
Image-Text-to-Text • Updated • 1.76M • 1.31k -
Qwen/Qwen3-Coder-30B-A3B-Instruct
Text Generation • 31B • Updated • 1.83M • • 1.01k -
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
Text Generation • 16B • Updated • 438k • • 581
🧩 Long-Context Models (≥128k) under 8B
Qwen3
Best of Qwen3 Series of Models
-
Qwen/Qwen3-30B-A3B-Instruct-2507
Text Generation • Updated • 1.04M • • 799 -
Qwen/Qwen3-Next-80B-A3B-Thinking
Text Generation • Updated • 34.3k • • 487 -
Qwen/Qwen3-Coder-30B-A3B-Instruct
Text Generation • 31B • Updated • 1.83M • • 1.01k -
Qwen/Qwen3-Omni-30B-A3B-Instruct
Any-to-Any • 35B • Updated • 369k • 907
🛩️Qwen3-VL
the most powerful vision-language model in the Qwen series to date. Available in Dense and MoE architectures
-
Qwen/Qwen3-VL-30B-A3B-Thinking
Image-Text-to-Text • 31B • Updated • 88k • • 197 -
mlx-community/Qwen3-VL-30B-A3B-Instruct-4bit
Image-Text-to-Text • Updated • 355 • 7 -
mlx-community/Qwen3-VL-30B-A3B-Instruct-8bit
Image-Text-to-Text • Updated • 100 • 3 -
mlx-community/Qwen3-VL-8B-Instruct-4bit
Image-Text-to-Text • Updated • 1.97k • 5
🍎 MLX-Quantized Models (3/4/5/6-bit) Mac & iOS
Curated MLX-ready quantized LLMs that run fast on Apple Silicon (and some on iOS). Every card lists Bits · Group size · Peak UM (GB) · Stable context.
-
mlx-community/Apriel-1.5-15b-Thinker-3bit-MLX
Image-Text-to-Text • Updated • 7 -
mlx-community/Apriel-1.5-15b-Thinker-6bit-MLX
Image-Text-to-Text • Updated • 39 • 1 -
mlx-community/granite-4.0-h-tiny-3bit-MLX
Text Generation • 0.9B • Updated • 182 • 2 -
mlx-community/granite-4.0-tiny-preview-4bit
Text Generation • Updated • 10
🖼️ Vision Backbones & Image Embeddings
-
facebook/dinov2-base
Image Feature Extraction • 86.6M • Updated • 1.52M • 176 -
openai/clip-vit-large-patch14-336
Zero-Shot Image Classification • Updated • 16.7M • 301 -
google/siglip-so400m-patch14-384
Zero-Shot Image Classification • 0.9B • Updated • 2.14M • 669 -
BAAI/EVA-CLIP-8B
Feature Extraction • Updated • 1.26k • 50
🧊Sept 25 <Image-to-3D> [Top Releases]
Models that turn a single image (or image+prompt) into 3D assets meshes, Gaussians, or point clouds suited for AR/VR, product turntables, game props.
🎬 ✍️ Sept 25 <Video & Text2Video> (Top Releases)
open T2V & animation models emphasizing temporal coherence, controllability, and real-time playback. Great starting point for creative tools, Ads.
Top Apache 2.0 License
Free and Open Source provided you don't source model and claim right
-
openai/whisper-large-v3
Automatic Speech Recognition • 2B • Updated • 4.82M • • 5.58k -
facebook/wav2vec2-base-960h
Automatic Speech Recognition • 94.4M • Updated • 1.24M • 395 -
openai/whisper-small
Automatic Speech Recognition • Updated • 1.97M • 548 -
openai/whisper-tiny
Automatic Speech Recognition • Updated • 763k • 425
✍️➡️🎬 Text-to-Video
Models that create short videos from written prompts. Perfect for experimentation in generative video and creative storytelling.
🖌️ Image-to-Image
Image editing and transformation models :- from style transfer to super-resolution, inpainting, and diffusion-based edits.
🖼️➡️📚 Image-Text-to-Text
Multimodal models that take image + text as input and produce natural language output. Use cases: chart QA, visual document reasoning, VQA.
-
Qwen/Qwen2.5-VL-7B-Instruct
Image-Text-to-Text • 8B • Updated • 4.52M • • 1.49k -
Qwen/Qwen2.5-VL-3B-Instruct
Image-Text-to-Text • 4B • Updated • 6.31M • 634 -
google/gemma-3-4b-it
Image-Text-to-Text • Updated • 1.76M • 1.31k -
nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1
Image-Text-to-Text • Updated • 998k • 177
✍️ Text Generation
Collection of top open LLMs for writing, summarization, chat, reasoning, and document drafting. Includes small SLMs for devices and large models .
🧠General Purpose Dataset < 10M samples
Dataset that can 🌐chat, ⚡code and 🧮reasoning
🍎 MLX-Ready LLMs
MLX weights and proven for MLX inference
-
mlx-community/gpt-oss-20b-MXFP4-Q8
Text Generation • 21B • Updated • 596k • 50 -
lmstudio-community/Seed-OSS-36B-Instruct-MLX-4bit
Text Generation • 36B • Updated • 44.7k -
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit
Text Generation • 0.6B • Updated • 73.6k • 11 -
mlx-community/parakeet-tdt-0.6b-v2
Automatic Speech Recognition • Updated • 605k • 40
📱 OnDevice -Ready SLMs (≤4B)
Tiny, fast models that run on iPhone/iPad or Mac with very low memory. Great for quick replies, offline note-assist, and routing
-
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit
Text Generation • 1B • Updated • 72.1k • 7 -
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit
Text Generation • 1B • Updated • 346k • 8 -
lmstudio-community/gemma-3n-E4B-it-MLX-4bit
Image-Text-to-Text • Updated • 338k • 2 -
mlx-community/gemma-3-4b-it-qat-4bit
Image-Text-to-Text • Updated • 1.04M • 7
GPT2-JungleBook-from-Scratch-Models
The primary objective of project is to explore & analyze the impact of model size on text generation quality with GPT-2 arch trained from scratch.
Vision-LM
🛩️Qwen3-VL
the most powerful vision-language model in the Qwen series to date. Available in Dense and MoE architectures
-
Qwen/Qwen3-VL-30B-A3B-Thinking
Image-Text-to-Text • 31B • Updated • 88k • • 197 -
mlx-community/Qwen3-VL-30B-A3B-Instruct-4bit
Image-Text-to-Text • Updated • 355 • 7 -
mlx-community/Qwen3-VL-30B-A3B-Instruct-8bit
Image-Text-to-Text • Updated • 100 • 3 -
mlx-community/Qwen3-VL-8B-Instruct-4bit
Image-Text-to-Text • Updated • 1.97k • 5
<7B Best of MoE 🧠
Collection of Small size big impact MoE.
-
LiquidAI/LFM2-8B-A1B
Text Generation • 8B • Updated • 64.8k • 351 -
ibm-granite/granite-4.0-h-tiny
Text Generation • 7B • Updated • 81.9k • 198 -
microsoft/Phi-4-multimodal-instruct
Automatic Speech Recognition • 6B • Updated • 329k • 1.58k -
google/gemma-3n-E4B-it
Image-Text-to-Text • Updated • 39.5k • • 900
🍎 MLX-Quantized Models (3/4/5/6-bit) Mac & iOS
Curated MLX-ready quantized LLMs that run fast on Apple Silicon (and some on iOS). Every card lists Bits · Group size · Peak UM (GB) · Stable context.
-
mlx-community/Apriel-1.5-15b-Thinker-3bit-MLX
Image-Text-to-Text • Updated • 7 -
mlx-community/Apriel-1.5-15b-Thinker-6bit-MLX
Image-Text-to-Text • Updated • 39 • 1 -
mlx-community/granite-4.0-h-tiny-3bit-MLX
Text Generation • 0.9B • Updated • 182 • 2 -
mlx-community/granite-4.0-tiny-preview-4bit
Text Generation • Updated • 10
Audio Features
🖼️ Vision Backbones & Image Embeddings
-
facebook/dinov2-base
Image Feature Extraction • 86.6M • Updated • 1.52M • 176 -
openai/clip-vit-large-patch14-336
Zero-Shot Image Classification • Updated • 16.7M • 301 -
google/siglip-so400m-patch14-384
Zero-Shot Image Classification • 0.9B • Updated • 2.14M • 669 -
BAAI/EVA-CLIP-8B
Feature Extraction • Updated • 1.26k • 50
Feature Extraction with 🧠 Text Embeddings
models for turning text, images, audio (and combos) into useful vectors or feature maps. Ideal for search/RAG, clustering, recommendation, retrieval.
🧊Sept 25 <Image-to-3D> [Top Releases]
Models that turn a single image (or image+prompt) into 3D assets meshes, Gaussians, or point clouds suited for AR/VR, product turntables, game props.
🪶 Sept’25 <Text Generation Language Models >(Top Releases)
coding models and pipelines released this month that boost repo-level reasoning, GUI automation, and tool use. Focused on practical editing.
🎬 ✍️ Sept 25 <Video & Text2Video> (Top Releases)
open T2V & animation models emphasizing temporal coherence, controllability, and real-time playback. Great starting point for creative tools, Ads.
🖼️ **Text2Image, i2i ** September ’25 (Top Releases)
Cutting-edge image generation & VLM updates from September ’25. This collection spotlights models that improved text rendering, layout control & more.
Top Apache 2.0 License
Free and Open Source provided you don't source model and claim right
-
openai/whisper-large-v3
Automatic Speech Recognition • 2B • Updated • 4.82M • • 5.58k -
facebook/wav2vec2-base-960h
Automatic Speech Recognition • 94.4M • Updated • 1.24M • 395 -
openai/whisper-small
Automatic Speech Recognition • Updated • 1.97M • 548 -
openai/whisper-tiny
Automatic Speech Recognition • Updated • 763k • 425
📄➡️🔊 Text-to-Speech (TTS)
Speech synthesis models that turn text into natural audio. Includes multilingual TTS, low-latency real-time models, and voice-cloning variants.
✍️➡️🎬 Text-to-Video
Models that create short videos from written prompts. Perfect for experimentation in generative video and creative storytelling.
📚➡️🎨Text-to-Image
State-of-the-art diffusion and generative models that turn text prompts into detailed images. Includes lightweight CPU-friendly and photorealistic mdl
🖌️ Image-to-Image
Image editing and transformation models :- from style transfer to super-resolution, inpainting, and diffusion-based edits.
🎨➡️✍️ Image-to-Text
OCR, captioning, and visual QA models that turn pure images into descriptive or structured text.
-
Salesforce/blip-image-captioning-base
Image-to-Text • Updated • 2.22M • 847 -
Salesforce/blip-image-captioning-large
Image-to-Text • 0.5B • Updated • 1.41M • 1.47k -
nlpconnect/vit-gpt2-image-captioning
Image-to-Text • Updated • 214k • 927 -
microsoft/trocr-base-handwritten
Image-to-Text • 0.3B • Updated • 153k • 490
🖼️➡️📚 Image-Text-to-Text
Multimodal models that take image + text as input and produce natural language output. Use cases: chart QA, visual document reasoning, VQA.
-
Qwen/Qwen2.5-VL-7B-Instruct
Image-Text-to-Text • 8B • Updated • 4.52M • • 1.49k -
Qwen/Qwen2.5-VL-3B-Instruct
Image-Text-to-Text • 4B • Updated • 6.31M • 634 -
google/gemma-3-4b-it
Image-Text-to-Text • Updated • 1.76M • 1.31k -
nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1
Image-Text-to-Text • Updated • 998k • 177
🌀 Any-to-Any Multimodal Models
Models that can flexibly convert across modalities (text, image, audio, video). Ideal for researchers exploring unified multimodal-AI.
✍️ Text Generation
Collection of top open LLMs for writing, summarization, chat, reasoning, and document drafting. Includes small SLMs for devices and large models .
👨💻Mathematical Reasoning 🧮
Datasets tackling AI Toughest Challenges
🧠General Purpose Dataset < 10M samples
Dataset that can 🌐chat, ⚡code and 🧮reasoning
🧩 Long-Context Models (≥128k) CODING
10 CODING models that support ≥128k context (native or via officially documented scaling)
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.55M • • 5.7k -
google/gemma-3-4b-it
Image-Text-to-Text • Updated • 1.76M • 1.31k -
Qwen/Qwen3-Coder-30B-A3B-Instruct
Text Generation • 31B • Updated • 1.83M • • 1.01k -
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
Text Generation • 16B • Updated • 438k • • 581
🍎 MLX-Ready LLMs
MLX weights and proven for MLX inference
-
mlx-community/gpt-oss-20b-MXFP4-Q8
Text Generation • 21B • Updated • 596k • 50 -
lmstudio-community/Seed-OSS-36B-Instruct-MLX-4bit
Text Generation • 36B • Updated • 44.7k -
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-4bit
Text Generation • 0.6B • Updated • 73.6k • 11 -
mlx-community/parakeet-tdt-0.6b-v2
Automatic Speech Recognition • Updated • 605k • 40
🧩 Long-Context Models (≥128k) under 8B
📱 OnDevice -Ready SLMs (≤4B)
Tiny, fast models that run on iPhone/iPad or Mac with very low memory. Great for quick replies, offline note-assist, and routing
-
lmstudio-community/Qwen3-4B-Thinking-2507-MLX-8bit
Text Generation • 1B • Updated • 72.1k • 7 -
lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-MLX-4bit
Text Generation • 1B • Updated • 346k • 8 -
lmstudio-community/gemma-3n-E4B-it-MLX-4bit
Image-Text-to-Text • Updated • 338k • 2 -
mlx-community/gemma-3-4b-it-qat-4bit
Image-Text-to-Text • Updated • 1.04M • 7
Qwen3
Best of Qwen3 Series of Models
-
Qwen/Qwen3-30B-A3B-Instruct-2507
Text Generation • Updated • 1.04M • • 799 -
Qwen/Qwen3-Next-80B-A3B-Thinking
Text Generation • Updated • 34.3k • • 487 -
Qwen/Qwen3-Coder-30B-A3B-Instruct
Text Generation • 31B • Updated • 1.83M • • 1.01k -
Qwen/Qwen3-Omni-30B-A3B-Instruct
Any-to-Any • 35B • Updated • 369k • 907
GPT2-JungleBook-from-Scratch-Models
The primary objective of project is to explore & analyze the impact of model size on text generation quality with GPT-2 arch trained from scratch.