duckdb-nsql-hub (DuckDB Text-2-SQL Bench)

cfahlgren1

submitted a paper to Daily Papers 3 months ago

How AI Impacts Skill Formation

Paper • 2601.20245 • Published Jan 28 • 10

cfahlgren1

updated 2 datasets 7 months ago

duckdb-nsql-hub/duckdb-nsql-predictions

Viewer • Updated Sep 29, 2025 • 4.88k • 10

duckdb-nsql-hub/duckdb-nsql-scores

Viewer • Updated Sep 29, 2025 • 142 • 66

cfahlgren1

posted an update 10 months ago

Post

1132

I ran the Anthropic Misalignment Framework for a few top models and added it to a dataset: cfahlgren1/anthropic-agentic-misalignment-results

You can read the reasoning traces of the models trying to blackmail the user and perform other actions. It's very interesting!!

cfahlgren1

posted an update 11 months ago

Post

419

Really nice to see AllenAI drop the Reward-Bench-2 dataset and leaderboard from their new paper all on the hub! 👏

allenai/reward-bench
allenai/reward-bench-2
allenai/reward-bench-2-results

Great work @natolambert , allenai and others!! 🤗

cfahlgren1

posted an update 11 months ago

Post

1740

Yesterday, we dropped a new conversational viewer for datasets on the hub! 💬

Actually being able to view and inspect your data is extremely important. This is a big step in making data more accessible and actionable for everyone.

Here's some datasets you can try it out on:
• mlabonne/FineTome-100k
• Salesforce/APIGen-MT-5k
• open-thoughts/OpenThoughts2-1M
• allenai/tulu-3-sft-mixture

Any other good ones?

1 reply

·

tdoehmen

updated a Space about 1 year ago

DuckDB SQL Eval

🦆

6

Run SQL evaluation with model inference

cfahlgren1

authored a paper about 1 year ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4, 2025 • 258

cfahlgren1

posted an update about 1 year ago

Post

2357

If you haven't seen yet, we just released Inference Providers 🔀

> 4 new serverless inference providers on the Hub 🤯
> Use your HF API key or personal key with all providers 🔑
> Chat with Deepseek R1, V3, and more on HF Hub 🐋
> We support Sambanova, TogetherAI, Replicate, and Fal.ai 💪

Best of all, we don't charge any markup on top of the provider 🫰 Have you tried it out yet? HF Pro accounts get $2 of free usage for the provider inference.

cfahlgren1

posted an update over 1 year ago

Post

1793

Wow, I just added Langfuse tracing to the Deepseek Artifacts app and it's really nice 🔥

It allows me to visualize and track more things along with the cfahlgren1/react-code-instructions dataset.

It was just added as a one click Docker Space template, so it's super easy to self host 💪

cfahlgren1

posted an update over 1 year ago

Post

2283

You'll notice the AI in the SQL Console is much better at working with chatml conversations:

Here's example of unnesting the cfahlgren1/react-code-instructions in less than 10 seconds by asking it. Check it out here: cfahlgren1/react-code-instructions

- "show me the average assistant response length"
- "extract user, system, and assistant messages into separate columns"

It's super easy to work with conversational datasets now with natural language 🗣️

2 replies

·

cfahlgren1

posted an update over 1 year ago

Post

3356

The deepseek-ai/DeepSeek-V3 is very good! I have been playing with it and found it is really good at one-shotting a pretty good landing page.

You can play with it here: https://deepseek-artifacts.vercel.app

All the responses get saved in the cfahlgren1/react-code-instructions dataset. Hopefully we can build one of the biggest, highest quality frontend datasets on the hub 💪

cfahlgren1

updated a Space over 1 year ago

DuckDB NSQL Leaderboard

📊

7

Display DuckDB NSQL leaderboard scores

cfahlgren1

posted an update over 1 year ago

Post

1948

You can just ask things 🗣️

"show me messages in the coding category that are in the top 10% of reward model scores"

Download really high quality instructions from the Llama3.1 405B synthetic dataset 🔥

argilla/magpie-ultra-v1.0

cfahlgren1

updated a dataset over 1 year ago

duckdb-nsql-hub/sql-console-prompt

Viewer • Updated Dec 3, 2024 • 1 • 31 • 9

cfahlgren1

posted an update over 1 year ago

Post

3085

We just dropped an LLM inside the SQL Console 🤯

The amazing, new Qwen/Qwen2.5-Coder-32B-Instruct model can now write SQL for any Hugging Face dataset ✨

It's 2025, you shouldn't be hand writing SQL! This is a big step in making it where anyone can do in depth analysis on a dataset. Let us know what you think 🤗

cfahlgren1

posted an update over 1 year ago

Post

942

observers 🔭 - automatically log all OpenAI compatible requests to a dataset💽

• supports any OpenAI compatible endpoint 💪
• supports DuckDB, Hugging Face Datasets, and Argilla as stores

> pip install observers

No complex framework. Just a few lines of code to start sending your traces somewhere. Let us know what you think! @davidberenstein1957 and I will continue iterating!

Here's an example dataset that was logged to Hugging Face from Ollama: cfahlgren1/llama-3.1-awesome-chatgpt-prompts

cfahlgren1

posted an update over 1 year ago

Post

880

You can create charts, leaderboards, and filters on top of any Hugging Face dataset in less than a minute

• ASCII Bar Charts 📊
• Powered by DuckDB WASM ⚡
• Download results to Parquet 💽
• Embed and Share results with friends 📬

Do you have any interesting queries?

cfahlgren1

posted an update over 1 year ago

Post

755

What rank are you on Hugging Face Top Yappers? 🗣️

Find your rank here with this link: cfahlgren1/hub-stats

The Top 3:
- @fdaudens
- @singhsidhukuldeep
- @akhaliq

I am at #71 and need to get my numbers up! 📈

4 replies

·

cfahlgren1

posted an update over 1 year ago

Post

3286

You can clean and format datasets entirely in the browser with a few lines of SQL.

In this post, I replicate the process @mlabonne used to clean the new microsoft/orca-agentinstruct-1M-v1 dataset.

The cleaning process consists of:
- Joining the separate splits together / add split column
- Converting string messages into list of structs
- Removing empty system prompts

https://huggingface.co/blog/cfahlgren1/the-beginners-guide-to-cleaning-a-dataset

Here's his new cleaned dataset: mlabonne/orca-agentinstruct-1M-v1-cleaned

1 reply

·

AI & ML interests

Team members 2

duckdb-nsql-hub's activity

DuckDB SQL Eval

DuckDB NSQL Leaderboard