SomosNLP

non-profit

https://somosnlp.org/

SomosNLP_

somosnlp

Activity Feed

AI & ML interests

Democratizar el PLN en español e incentivar su aplicación para generar impacto social 💛

Recent Activity

mariagrandury authored a paper about 2 months ago

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data

mariagrandury authored a paper about 2 months ago

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

reddrex updated a dataset 3 months ago

somosnlp/LingComp_QA

View all activity

alvarobartt

posted an update about 1 month ago

Post

3600

Learn how to deploy Microsoft Research VibeVoice ASR on Microsoft Azure Foundry with Hugging Face to generate rich audio transcriptions with Who, When, and What! 💥

> 🕒 60-minute single-pass processing, no chunking or stitching
> 👤 Customized hotwords to guide recognition on domain-specific content
> 📝 Rich transcription: joint ASR + diarization + timestamping in one pass
> 🌍 50+ languages with automatic detection and code-switching support
> 🤗 Deployed on Microsoft Foundry via an OpenAI-compatible Chat Completions API

https://huggingface.co/docs/microsoft-azure/foundry/examples/deploy-vibevoice-asr

lewtun

submitted 2 papers to Daily Papers 2 months ago

Single-minus gluon tree amplitudes are nonzero

Paper • 2602.12176 • Published Feb 12 • 8

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

Paper • 2602.03773 • Published Feb 3 • 13

alvarobartt

posted an update 3 months ago

Post

3221

💥 hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!

uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.

💡 Alternatively, you can also set the --max-model-len, --batch-size and --kv-cache-dtype arguments (à la vLLM) manually if preferred.

1 reply

reddrex

updated a dataset 3 months ago

somosnlp/LingComp_QA

Viewer • Updated Jan 15 • 1k • 279 • 1

pcuenq

posted an update 3 months ago

Post

4450

👉 What happened in AI in 2025? 👈

We prepared the 2025 version of the HF AI Timeline Grid, highlighting open vs API-based model releases, and allowing you to browse and filter by access, modality, and release type!

Play with it here:
2025-ai-timeline/2025-ai-timeline

Here's my personal quarterly TL;DR:

1️⃣ Q1 — Learning to Reason
Deepseek not only releases a top-notch reasoning model, but shows how to train them and compete with closed frontier models. OpenAI debuts Deep Research.

Significant milestones: DeepSeek R1 & R1-Zero, Qwen 2.5 VL, OpenAI Deep Research, Gemini 2.5 Pro (experimental)

2️⃣ Q2 — Multimodality and Coding
More LLMs embrace multimodality by default, and there's a surge in coding agents. Strong vision, audio, and generative models emerge.

Significant milestones: Llama 4, Qwen 3, Imagen 4, OpenAI Codex, Google Jules, Claude 4

3️⃣ Q3 — "Gold" rush, OpenAI opens up, the community goes bananas
Flagship models get gold in Math olympiads and hard benchmarks. OpenAI releases strong open source models and Google releases the much anticipated nano-banana for image generation and editing. Agentic workflows become commonplace.

Significant milestones: Gemini and OpenAI IMO Gold, gpt-oss, Gemini 2.5 Flash Image, Grok 4, Claude Sonnet 4.5

4️⃣ Q4 — Mistral returns, leaderboard hill-climbing
Mistral is back with updated model families. All labs release impressive models to wrap up the year!

Significant milestones: Claude Opus 4.5, DeepSeek Math V2, FLUX 2, GPT 5.1, Kimi K2 Thinking, Nano Banana Pro, GLM 4.7, Gemini 3, Mistral 3, MiniMax M2.1 🤯

Credits
🙏 NHLOCAL for the source data https://github.com/NHLOCAL/AiTimeline

🫡 @reach-vb for the original idea, design and recipe

🙌 @ariG23498 and yours truly for compiling and verifying the 2025 edition

🥳 Here's to 2026, wishing it becomes the best year ever for open releases and on-device-first use-cases! 🥂

3 replies

maximorulli

authored a paper 6 months ago

Attention Sinks in Diffusion Language Models

Paper • 2510.15731 • Published Oct 17, 2025 • 50

mariagrandury

updated a dataset 7 months ago

somosnlp/recursos-pln-es

Viewer • Updated Sep 18, 2025 • 183 • 40 • 1

mariagrandury

published a dataset 7 months ago

somosnlp/recursos-pln-es

Viewer • Updated Sep 18, 2025 • 183 • 40 • 1

mariagrandury

updated a dataset 7 months ago

somosnlp/recursos-pln-es-models

Viewer • Updated Sep 16, 2025 • 22 • 8

mariagrandury

published a dataset 7 months ago

somosnlp/recursos-pln-es-models

Viewer • Updated Sep 16, 2025 • 22 • 8

mariagrandury

updated a Space 8 months ago

Leaderboard Retos Hackathon SomosNLP 2025

🏆

Leaderboard Retos Hackathon SomosNLP 2025

mariagrandury

published a dataset 10 months ago

somosnlp/babylm-es

Updated Jun 19, 2025 • 9

dvilasuero

posted an update 10 months ago

Post

3400

Super excited to launch Hugging Face Sheets: Spreadsheets meet AI and unstructured data.

A few months ago, we started imagining new ways to build and transform datasets with the latest open-source models.

Today, I'm thrilled to introduce our first step in this direction.

In a nutshell:

📁 Effortlessly run prompts and models over your data.
🌐 Agentic search for accuracy and real-time information.
🖼️ Familiar, minimalistic interface for interacting with data.
🎯 Human feedback 2.0: Your input directly improves generated data.
💯 Access hundreds of open models and leading inference providers.

Go to this space to try it out!

aisheets/sheets

Leave your questions below, we're just getting started!