Inference Endpoints Images

community

https://endpoints.huggingface.co/

AI & ML interests

Hugging Face Inference Endpoints Images repository allows AI Builders to collaborate and engage creating awesome inference deployments

Recent Activity

AdinaY submitted a paper 2 days ago

TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification

AdinaY submitted a paper 19 days ago

KAT-Coder-V2 Technical Report

AdinaY submitted a paper about 1 month ago

Training Language Models via Neural Cellular Automata

View all activity

AdinaY

submitted a paper to Daily Papers 2 days ago

TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification

Paper • 2604.14531 • Published 3 days ago • 6

qgallouedec

posted an update 2 days ago

Post

1599

TRL v1.2 introduces the SSDTrainer 🚀

Simple Self-Distillation (SSD) from Apple's paper "Embarrassingly Simple Self-Distillation Improves Code Generation" is now available as an experimental trainer in TRL.

The recipe is as minimal as the name suggests: sample completions from the model itself at a training-time temperature, then fine-tune on those raw, unverified samples with plain cross-entropy. No reward model. No verifier. No teacher model. No reinforcement learning. Just prompts and the model.

from trl.experimental.ssd import SSDConfig, SSDTrainer

trainer = SSDTrainer(
    model="Qwen/Qwen3-4B-Instruct",
    args=SSDConfig(temperature=0.6, top_k=20, top_p=0.95),
    train_dataset=dataset,
)
trainer.train()

v1.2 also ships expanded tool-calling support (LLaMA 3.1 / 3.2, DeepSeek-V3), another round of KTO ↔ DPO alignment getting us closer to promoting KTO to stable, a big GRPO simplification for overlong tool results, deprecation of use_transformers_paged, and key fixes for VLM response parsing.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.2.0

qgallouedec

posted an update 19 days ago

Post

2305

TRL v1.0 is out!

Hugging Face's TRL library is downloaded 3 million times a month. Over 130k models trained with it are public on the Hub, and major projects like @unsloth and @axolotl-ai-co build directly on top of it. v1.0 is the moment we acknowledged that responsibility explicitly, with a real stability contract.

The field hasn't settled. Building stable software in a domain that keeps invalidating its own assumptions is the actual problem we're solving. The answer is a design that can absorb the next shift without breaking what people rely on.

What's in v1.0:
Deep Hugging Face integration, low infrastructure burden
What's next: asynchronous GRPO, better scaling support, and making training legible enough that agents can inspect and steer it.

pip install --upgrade trl

KAT-Coder-V2 Technical Report

Paper • 2603.27703 • Published 21 days ago • 10

AdinaY

submitted 2 papers to Daily Papers about 1 month ago

Training Language Models via Neural Cellular Automata

Paper • 2603.10055 • Published Mar 9 • 8

Planning in 8 Tokens: A Compact Discrete Tokenizer for Latent World Model

Paper • 2603.05438 • Published Mar 5 • 40

alvarobartt

posted an update about 1 month ago

Post

3603

Learn how to deploy Microsoft Research VibeVoice ASR on Microsoft Azure Foundry with Hugging Face to generate rich audio transcriptions with Who, When, and What! 💥

> 🕒 60-minute single-pass processing, no chunking or stitching
> 👤 Customized hotwords to guide recognition on domain-specific content
> 📝 Rich transcription: joint ASR + diarization + timestamping in one pass
> 🌍 50+ languages with automatic detection and code-switching support
> 🤗 Deployed on Microsoft Foundry via an OpenAI-compatible Chat Completions API

https://huggingface.co/docs/microsoft-azure/foundry/examples/deploy-vibevoice-asr

qgallouedec

posted an update about 2 months ago

Post

3007

@CohereLabs just released 🌿 Tiny Aya: a fully open-source 3B parameter model that speaks 70+ languages 🌍! But there’s a catch:

Tiny Aya is just a language model. It doesn’t support tool calling, the key capability that turns frontier models into powerful *agents*.
So the real question is:

How hard is it to turn Tiny Aya into an agent?

Turns out… it’s simple, thanks to Hugging Face TRL.
We’re sharing a hands-on example showing how to train Tiny Aya to turn it into a tool-calling agent using TRL, unlocking what could become the first *massively multilingual open agent*.

Small model. Global reach. Agent capabilities.

👉 https://github.com/huggingface/trl/blob/main/examples/notebooks/sft_tool_calling.ipynb

1 reply

AdinaY

posted an update 2 months ago

Post

3630

MiniMax M2.5 is now available on the hub 🚀

MiniMaxAI/MiniMax-M2.5

✨ 229B - Modified MIT license
✨37% faster than M2.1
✨ ~$1/hour at 100 TPS

2 replies

AdinaY

posted an update 2 months ago

Post

739

RynnBrain 🤖 a physics aware embodied brain for robots from Alibaba DAMO

https://huggingface.co/collections/Alibaba-DAMO-Academy/rynnbrain

✨ 2B/8B/30B (3B active)
✨ Apache 2.0
✨ Understands egocentric scenes with strong spatial awareness
✨ Tracks objects and motion over time

2 replies

AdinaY

posted an update 2 months ago

Post

4119

Game on 🎮🚀

While Seedance 2.0’s videos are all over the timeline, DeepSeek quietly pushed a new model update in its app.

GLM-5 from Z.ai adds more momentum.

Ming-flash-omni from Ant Group , MiniCPM-SALA from OpenBMB
, and the upcoming MiniMax M2.5 keep the heat on 🔥

Spring Festival is around the corner,
no one’s sleeping!

✨ More releases coming, stay tuned
https://huggingface.co/collections/zh-ai-community/2026-february-china-open-source-highlights

AdinaY

posted an update 2 months ago

Post

3949

Ming-flash-omni 2.0 🚀 New open omni-MLLM released by Ant Group

inclusionAI/Ming-flash-omni-2.0

✨ MIT license
✨ MoE - 100B/6B active
✨ Zero-shot voice cloning + controllable audio
✨ Fine-grained visual knowledge grounding

2 replies

AdinaY

posted an update 2 months ago

Post

801

LLaDA 2.1 is out 🔥 A new series of MoE diffusion language model released by AntGroup

inclusionAI/LLaDA2.1-mini
inclusionAI/LLaDA2.1-flash

✨LLaDA2.1-mini: 16B - Apache2.0
✨LLaDA2.1-flash: 100B - Apache2.0
✨Both delivers editable generation, RL-trained diffusion reasoning and fast inference

2 replies

AdinaY

posted an update 2 months ago

Post

2619

AI for science is moving fast🚀

Intern-S1-Pro 🔬 a MoE multimodal scientific reasoning model from Shanghai AI Lab

internlm/Intern-S1-Pro

✨ 1T total / 22B active
✨ Apache 2.0
✨ SoTA scientific reasoning performance
✨ FoPE enables scalable modeling of long physical time series (10⁰–10⁶)

2 replies

AdinaY

posted an update 2 months ago

Post

1408

✨ China’s open source AI ecosystem has entered a new phase

https://huggingface.co/blog/huggingface/one-year-since-the-deepseek-moment-blog-3

One year after the “DeepSeek Moment,” open source has become the default. Models, research, infrastructure, and deployment are increasingly shared to support large-scale, system-level integration.

This final blog examines how leading Chinese AI organizations are evolving ,and what this implies for the future of open source.

AdinaY

posted an update 3 months ago

Post

415

GLM just entered the OCR field🔥

zai-org/GLM-OCR

✨ 0.9B
✨ MIT licensed
✨ Multimodal GLM-V architecture
✨ #1 on OmniDocBench v1.5 (94.62)

AdinaY

posted an update 3 months ago

Post

1625

Step 3.5 Flash 🔥 new foundation model from StepFun ai

https://huggingface.co/collections/stepfun-ai/step-35-flash

✨ Sparse MoE：196B/11B active
✨ Supports up to 256K context
✨ Multi-token prediction for fast decoding (100–300 tok/s)
✨ Runs locally on consumer hardware

AdinaY

posted an update 3 months ago

Post

1141

What a week 🤯

Following DeepSeek, Kimi, Qwen, Baidu, and Ant Group, Unitree Robotics
has now released a VLA model on the hub too!

unitreerobotics/UnifoLM-VLA-Base

alvarobartt

posted an update 3 months ago

Post

3221

💥 hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!

uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.

💡 Alternatively, you can also set the --max-model-len, --batch-size and --kv-cache-dtype arguments (à la vLLM) manually if preferred.

1 reply

AdinaY

posted an update 3 months ago

Post

323

LongCat-Flash-Lite🔥 a non-thinking MoE model released by Meituan LongCat team.

meituan-longcat/LongCat-Flash-Lite

✨ Total 68.5B / 3B active - MIT license
✨ 256k context
✨ Faster inference with N-gram embeddings

AI & ML interests

Recent Activity

Team members 27

hfendpoints-images's activity