<div align="center">

<img src="https://cdn-avatars.huggingface.co/v1/production/uploads/63aa4990769a10efc403771c/-hPclrsYl0IW6kqD2DWBL.png" width="100" alt="DATUMO logo"/>

# ⭐ DATUMO
### *The Data-centric AI Company*

**Built by [Selectstar](https://selectstar.ai/) — data infrastructure for trustworthy AI**

[![Website](https://img.shields.io/badge/🌐_Website-selectstar.ai-4f46e5?style=for-the-badge)](https://selectstar.ai/)
[![Blog](https://img.shields.io/badge/📰_Blog-Read-0ea5e9?style=for-the-badge)](https://selectstar.ai/blog/)
[![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://kr.linkedin.com/company/datumo-usa)
[![Contact](https://img.shields.io/badge/Contact-Us-EA4335?style=for-the-badge&logo=gmail&logoColor=white)](https://selectstar.ai/contact_page/)

</div>

## 👋 About Us

We're **Selectstar** — a Korean AI company building the **data foundation for trustworthy AI**.
Since 2018, we've partnered with AI teams across the entire data value-chain: from dataset design and construction to **LLM reliability evaluation and red-teaming**.

Our flagship **Datumo Platform** is Korea's first end-to-end AI trust evaluation solution, unifying dataset preparation, automated evaluation, red-teaming, and improvement analytics in a single pipeline.

> 🇰🇷 **안녕하세요, 셀렉트스타입니다.**
> 데이터 설계·구축부터 LLM 신뢰성 검증까지, AI 개발의 모든 단계를 함께하는 **Data-centric AI 기업**입니다.
> 이 페이지에서는 저희가 연구·실무에 사용하는 데이터셋과 모델을 오픈소스로 공유하고 있어요.

---

## 🎯 What We Do

Perception AI(2018~) → Generative AI(2022~) → **Agentic AI(2026~)** 로 이어지는 AI 진화 전 단계에 걸쳐, 데이터 구축부터 신뢰성 검증까지 **End-to-End 파이프라인**을 제공합니다.

| 🗂️ Data Construction | 🛡️ AI Trust & Safety | 📊 Datumo Platform |
|---|---|---|
| 고난도 추론 데이터 생성 (CAC-CoT, GRADE, ATA, COBA) | LLM 레드티밍 (CAGE, STAR-Teaming) | 국내 최초 LLM 신뢰성 자동화 평가 플랫폼 |
| 사전학습·파인튜닝 데이터 라이선싱 | 한국어 Safety 벤치마크 (KorNAT, KorSET, FinRED) | **평가 기간 45일 → 45분** |
| RAG 지식 파이프라인 | Safety Judge (Datumo-Guard) | 온프레미스·망분리 환경 지원 |
| 25만 명+ 크라우드워커 · 2억 건+ 어노테이션 | 금융·의료·공공 도메인 특화 평가 | Dashboard Analytics & Reporting |

> 🤝 **주요 파트너십**: SKT 독자 AI 파운데이션 모델(독파모) 컨소시엄 · GSMA Open Telco AI · 삼성생명 C-Lab Outside · 금융보안원 · 식약처 의료 레드팀

---

## 📚 Featured Collections

### 🛡️ [Safety-Data](https://huggingface.co/collections/datumo/safety-data)

Curated by our **AI Safety team** — Korean-language safety and reliability benchmarks for LLM evaluation.

| Dataset | Description | Venue |
|---|---|---|
| 🔸 **FinRED** | 금융 도메인 LLM **레드티밍(Red-Teaming)** 평가 벤치마크 (금융보안원 AI혁신실 공동 구축) | KDD 2026 D&B Track |
| 🔸 [**KorSET**](https://huggingface.co/datasets/datumo/KorSET) | CAGE 프레임워크로 구축한 한국어 레드티밍 벤치마크 (5개 위험 도메인 · 12개 카테고리 · 53개 세부 유형 · ~8,000건) | **ICLR 2026** (CAGE) |
| 🔸 [**KorNAT**](https://huggingface.co/datasets/datumo/KorNAT) | Korea's first LLM reliability / national-alignment benchmark | ACL 2024 Findings |

### 📦 [Data-Data](https://huggingface.co/collections/datumo/data-data)

Research outputs from our **Data team** — models and datasets built in-house.

| Resource | Description | Type |
|---|---|---|
| 🔸 [**CAC-CoT dataset**](https://huggingface.co/datasets/datumo/CAC-CoT) | Accompanying training data for CAC-CoT | Dataset |
---

## 🏆 Milestones

**Highlight (최근 주요 성과)**

- 🏅 **Forbes "30 Under 30 Asia" 2021** — Enterprise Technology (공동창업자 4인 선정)
- 🏅 **Forbes Korea "2025 대한민국 AI 50"** 선정
- 🏅 **Forbes Asia "100 유망 기업" 2025** 선정
- 🇰🇷 **독자 AI 파운데이션 모델(독파모)** 1차 통과 (2026.01, SKT 컨소시엄 데이터 총괄)
- 🌐 **GSMA Open Telco AI** 공식 파트너 합류 (2026.03, MWC Barcelona)
- 💰 누적 투자 **434억원 돌파** (2025.12, Series B 확장)
- 📈 누적 어노테이션 **2억 건+** · 기업 고객 **287개+** · 크라우드워커 **25만 명+**

<details>
<summary><b>📜 전체 연혁 보기 (2018–2026)</b></summary>

### 🌱 Founding & Early Traction (2018–2020)

| 연월 | 내용 |
|---|---|
| 2018.11 | 셀렉트스타(주) 설립 |
| 2018.12 | KAIST 창업대회(E*5) 최우수상 |
| 2019.07 | 카카오벤처스 SEED 4억 투자 유치 |
| 2019.09 | KorQuAD 2.0 Dataset 구축 (LG CNS 공동) |
| 2019.10 | TIPS 프로그램 선정 |
| 2019.12 | 기업부설연구소 설립 인정 |
| 2020.09 | Series A 40억 투자 유치 (카카오벤처스·코오롱인베스트먼트·컴퍼니케이파트너스) |
| 2020.10 | **SideGuide** (IROS 2020) 논문 성과 — Large-scale Sidewalk Dataset |
| 2020.11 | 데이터스타즈 최우수상 (과학기술정보통신부장관상) |

### 🚀 Scale-Up & Global Recognition (2021–2022)

| 연월 | 내용 |
|---|---|
| 2021.01 | Samsung C-Lab Outside 선정 |
| **2021.04** | 🏅 **Forbes "30 Under 30 Asia"** Enterprise Technology 선정 (공동창업자 4인) |
| 2021.11 | **KLUE** NeurIPS 2021 Datasets & Benchmarks 논문 성과 |
| 2022.01 | CES 2022 참여 (Samsung C-Lab) |
| 2022.02 | 제1기 인공지능 윤리 정책 포럼 기술 분과 위원 선정 |
| 2022.03 | **Instance-wise Occlusion and Depth Orders** CVPR 2022 논문 성과 |
| 2022.07 | Series A Extension 90억 투자 유치 |
| 2022.07 | 기술혁신형 중소기업(inno-Biz) 인증 |
| 2022.11 | **KOLD** (EMNLP 2022), **CochlScene** (APSIPA 2022), **Split-GCN** (TPAMI, 1저자) 논문 성과 |

### 🧠 LLM Era & AI Safety (2023–2024)

| 연월 | 내용 |
|---|---|
| 2023.05 | Series A Extension 40억 투자 유치 (산업은행) |
| 2023.06 | AI 기반 국방 혁신 포럼 대상 (육군참모총장상) |
| 2023.07 | "AI Talk with Andrew Ng" 행사 Keynote Speaker |
| 2023.10 | Samsung Developer Conference 2023 연사 참여 |
| 2023.11 | 대한민국 Digital Innovation Award 특별상 |
| 2023.12 | **Analyzing Norm Violations in Live-Stream Chat** EMNLP 2023 논문 성과 |
| 2023.12 | 국내 최초 "초거대 언어 모델 신뢰성 벤치마크 데이터셋" 구축 (NIA) |
| 2024.04 | **Gen AI Korea 2024: 생성형 AI 레드팀 챌린지** 컨퍼런스 기획·운영 (과기정통부) |
| **2024.08** | **KorNAT** ACL 2024 Findings 논문 성과 — 국내 AI 데이터 기업 최초 글로벌 Top AI 학회 데이터셋 1저자 논문 |
| 2024.10 | KT 'Responsible AI 자문 위원회' 자문위 위원 선정 |
| 2024.11 | 제2회 인공지능 신뢰성 대상 우수상 (정보통신정책연구원 원장상) |
| 2024.11 | GSMA AI Summit 2024 연사 참여 |
| 2024.12 | 국내 최초 **LLM 무해성 평가 데이터 DQ(Data Quality) 인증** 획득 (TTA) |
| 2024.12 | 2024 아시아AI대상 벤처기업협회 회장상 |

### 🌐 Agentic AI & Global Expansion (2025–2026)

| 연월 | 내용 |
|---|---|
| 2025.02 | **Datumo Eval 출시** — 국내 최초 LLM 자동화 평가 플랫폼 |
| 2025.03 | **Gen AI Red Team Challenge** 공동 개최 (MWC Barcelona, GSMA) — 세계 최초 오프라인 글로벌 레드팀 챌린지 |
| 2025.04 | AI 기본법 안전성 가이드라인 TF 위원 선정 (과기정통부·AI안전연구소, 김세엽 대표) |
| 2025.05 | 🏅 **Forbes Korea "2025 대한민국 AI 50"** 선정 |
| 2025.06 | 삼성금융 C-Lab Outside 최종 선정 (삼성생명 금융 AI 신뢰성 검증 협업) |
| 2025.07 | 민간 AI 신뢰성 인증 'AI-MASTER' 시험기관 참여 (국내 최초 민간 주도 체계) |
| 2025.08 | **Series B 205억원 투자 유치** |
| 2025.08 | 🏅 **Forbes Asia "100 유망 기업 2025"** 선정 |
| 2025.08 | **독자 AI 파운데이션 모델(독파모)** 정예팀 선발 (SKT 컨소시엄 데이터 총괄) |
| 2025.09 | **국가인공지능전략위원회 데이터 분과위원** 위촉 (김세엽 대표) |
| 2025.09 | 식약처 첨단 AI 디지털 의료제품 레드팀 챌린지 후원 — 아시아 첫 '의료 레드팀' |
| 2025.10 | 삼성금융 C-Lab Outside **최우수 스타트업** 선정 (삼성생명) |
| 2025.11 | **CAC-CoT · CoBA · GRADE** EMNLP 2025 논문 3편 동시 등재 |
| 2025.11 | 2025 이데일리 AI 코리아 대상 (한국인공지능산업협회장상) |
| 2025.11 | Good AI Awards 2025 NIA 원장상 |
| 2025.12 | Series B 55억원 추가 투자 — **누적 투자 434억원** 돌파 |
| **2026.01** | 🇰🇷 **독자 AI 파운데이션 모델(독파모) 1차 통과** (SKT 컨소시엄) |
| 2026.02 | **CAGE** ICLR 2026 Main Conference 논문 성과 |
| 2026.03 | **GSMA 'Open Telco AI'** 글로벌 연합체 공식 파트너 합류 (MWC Barcelona) |
| 2026.03 | MWC 2026 Gen AI Red Team Challenge 공동 주관 (GSMA · LG U+) |

</details>

---

## 📖 Publications

셀렉트스타가 단독·공동·지원 참여한 논문 목록입니다. 국제 AI·ML Top 학회 중심으로 정리했습니다.

<details open>
<summary><b>🔥 2026 (5편)</b></summary>

| Paper | Co-authors | Venue |
|---|---|---|
| **STAR-Teaming**: A Strategy-Response Multiplex Network Approach to Automated LLM Red Teaming | Selectstar | ACL 2026 |
| **FinRED**: An Expert-Guided Red-Teaming Benchmark for Financial LLM Safety | Selectstar · 금융보안원 AI혁신실 | KDD 2026 Dataset & Benchmark Track |
| [**CAGE**](https://openreview.net/forum?id=gCm55KYiqz): A Framework for Culturally Adaptive Red-Teaming Benchmark Generation | Selectstar | **ICLR 2026** Main |
| **E-star-12B**: Rubric-Following Evaluator Adaptive Across Industrial Domains | Selectstar | ACL 2026 Workshop (진행 중) |
| **ATA**: Autonomous Tabular-data Analysis for Insight Generation via Statistical Methods | Selectstar · 삼성증권 금융AI센터 | ARR 제출 |
</details>

<details>
<summary><b>📄 2025 (3편)</b></summary>

| Paper | Co-authors | Venue |
|---|---|---|
| [**CoBA**](https://aclanthology.org/2025.emnlp-main.520/): Counterbias Text Augmentation for Mitigating Various Spurious Correlations via Semantic Triples | Selectstar · 중앙대학교 | EMNLP 2025 Main |
| [**GRADE**](https://aclanthology.org/2025.findings-emnlp.236/): Generating multi-hop QA and fine-gRAined Difficulty matrix for RAG Evaluation | Selectstar · KAIST | EMNLP 2025 Findings |
| [**CAC-CoT**](https://aclanthology.org/2025.findings-emnlp.1062/): Connector-Aware Compact Chain-of-Thought for Efficient Reasoning Data Synthesis Across Dual-System Cognitive Tasks | Selectstar | EMNLP 2025 Findings |

</details>

<details>
<summary><b>📄 2024 (1편)</b></summary>

| Paper | Co-authors | Venue |
|---|---|---|
| [**KorNAT**](https://arxiv.org/abs/2402.13605): LLM Alignment Benchmark for Korean Social Values and Common Knowledge | Selectstar · KAIST · SKT · LG · 네이버 · KT · NIA | ACL 2024 Findings |

> 국내 AI 데이터 기업 최초 글로벌 Top AI 학회에 데이터셋 주제 1저자 논문 등재

</details>

<details>
<summary><b>📄 2021–2023 (5편)</b></summary>

| Year | Paper | Venue |
|---|---|---|
| 2023 | [**Analyzing Norm Violations in Live-Stream Chat**](https://aclanthology.org/2023.emnlp-main.55/) | EMNLP 2023 |
| 2022 | [**KOLD**](https://aclanthology.org/2022.emnlp-main.744/): Korean Offensive Language Dataset | EMNLP 2022 |
| 2022 | [**Split-GCN**](https://ieeexplore.ieee.org/document/9984937): Effective Interactive Annotation for Segmentation of Disconnected Instance | IEEE TPAMI (1저자) |
| 2022 | [**Instance-wise Occlusion and Depth Orders**](https://openaccess.thecvf.com/content/CVPR2022/html/Lee_Instance-Wise_Occlusion_and_Depth_Orders_in_Natural_Scenes_CVPR_2022_paper.html) | CVPR 2022 |
| 2022 | [**CochlScene**](https://ieeexplore.ieee.org/document/9979822): Acquisition of acoustic scene data using crowdsourcing | APSIPA 2022 |
| 2021 | [**KLUE**](https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/98dce83da57b0395e163467c9dae521b-Abstract-round2.html): Korean Language Understanding Evaluation | NeurIPS 2021 Datasets & Benchmarks |
| 2020 | [**SideGuide**](https://ieeexplore.ieee.org/document/9340734): A Large-scale Sidewalk Dataset for Guiding Impaired People | IROS 2020 |

</details>

> 전체 논문 목록 및 상세 내용은 [블로그](https://selectstar.ai/blog/) 또는 [문의하기](https://selectstar.ai/contact_page/)를 통해 확인하실 수 있습니다.

---

## 🤝 Connect
| | |
|---|---|
| 🌐 Website | [selectstar.ai](https://selectstar.ai/) |
| 📰 Blog | [selectstar.ai/blog](https://selectstar.ai/blog/) |
| 💼 Enterprise inquiries | [Contact form](https://selectstar.ai/contact_page/) |
| 💬 Community | [Discussion tab](https://huggingface.co/spaces/datumo/README/discussions) |
| 🔔 Updates | HuggingFace 팔로우로 새 릴리즈 알림 받기 |
---

<div align="center">
<sub>⭐ Building the data foundation for trustworthy AI · Made with care in Seoul 🇰🇷</sub>
</div>