--- license: apache-2.0 language: - ko base_model: - K-intelligence/Midm-2.0-Mini-Instruct tags: - image-to-text - korean - image - VLM - bigdefence - midm - KT - K-intelligence pipeline_tag: image-to-text --- ## ๐Ÿ“Š Midm-2.0-Mini-Vision-Instruct - **Midm-2.0-Mini-Vision-Instruct**์€ Midm-2.0-Mini-Vision-Instruct์€ ํ•œ๊ตญ์–ด ์ด๋ฏธ์ง€ ์ธ์‹์— ํŠนํ™”๋œ ๊ณ ์„ฑ๋Šฅ, ๊ฒฝ๋Ÿ‰ Vision-Language Model์ž…๋‹ˆ๋‹ค. K-intelligence/Midm-2.0-Mini-Instruct ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตฌ์ถ•๋˜์–ด ํ•œ๊ตญ์–ด ํ…์ŠคํŠธ๊ฐ€ ํฌํ•จ๋œ ์ด๋ฏธ์ง€ ์ดํ•ด์™€ ํ•œ๊ตญ์–ด ์‘๋‹ต ์ƒ์„ฑ์— ์ตœ์ ํ™”๋˜์—ˆ์Šต๋‹ˆ๋‹ค. - **End-to-End** LLaVA ๊ตฌ์กฐ๋ฅผ ์ฑ„ํƒํ•˜์—ฌ ์ด๋ฏธ์ง€ ์ž…๋ ฅ๋ถ€ํ„ฐ ํ…์ŠคํŠธ ์ถœ๋ ฅ๊นŒ์ง€ ํ•˜๋‚˜์˜ ํŒŒ์ดํ”„๋ผ์ธ์—์„œ ์ฒ˜๋ฆฌํ•˜๋ฉฐ, ์ถ”๊ฐ€์ ์ธ ์ค‘๊ฐ„ ๋ชจ๋ธ ์—†์ด ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ฒ˜๋ฆฌ๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/653494138bde2fae198fe89e/NAGzLbylQfYIJN-JI4NBN.png) ### ๐Ÿ“‚ ๋ชจ๋ธ ์ ‘๊ทผ - **GitHub**: [bigdefence/midm-vision](https://github.com/bigdefence/midm-vision) ๐ŸŒ - **HuggingFace**: [bigdefence/Midm-2.0-Mini-Vision-Instruct](https://huggingface.co/bigdefence/Midm-2.0-Mini-Vision-Instruct) ๐Ÿค— - **๋ชจ๋ธ ํฌ๊ธฐ**: 2B ํŒŒ๋ผ๋ฏธํ„ฐ ๐Ÿ“Š ## ๐ŸŒŸ ์ฃผ์š” ํŠน์ง• - **๐Ÿ‡ฐ๐Ÿ‡ท ํ•œ๊ตญ์–ด ํŠนํ™”**: ํ•œ๊ตญ์–ด ์–ธ์–ด์  ํŠน์„ฑ์— ์ตœ์ ํ™” - **โšก ๊ฒฝ๋Ÿ‰ํ™”**: 2B ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ํšจ์œจ์ ์ธ ์ถ”๋ก  ์„ฑ๋Šฅ - **๐ŸŽฏ ๊ณ ์ •ํ™•๋„**: ๋‹ค์–‘ํ•œ ํ•œ๊ตญ์–ด ํ™˜๊ฒฝ์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ - **๐Ÿ”ง ์‹ค์šฉ์„ฑ**: ์‹ค์‹œ๊ฐ„ ์ด๋ฏธ์ง€์ง€ ์ธ์‹ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์ ํ•ฉ ## ๐Ÿ“‹ ๋ชจ๋ธ ์ •๋ณด | ํ•ญ๋ชฉ | ์„ธ๋ถ€์‚ฌํ•ญ | |------|----------| | **๊ธฐ๋ฐ˜ ๋ชจ๋ธ** | K-intelligence/Midm-2.0-Mini-Instruct | | **์–ธ์–ด** | ํ•œ๊ตญ์–ด (Korean) | | **๋ชจ๋ธ ํฌ๊ธฐ** | ~2B ํŒŒ๋ผ๋ฏธํ„ฐ | | **์ž‘์—… ์œ ํ˜•** | Image-to-Text ์ด๋ฏธ์ง€ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ | | **๋ผ์ด์„ ์Šค** | Apache 2.0 | ### ๐Ÿ”ง ๋ ˆํฌ์ง€ํ† ๋ฆฌ ๋‹ค์šด๋กœ๋“œ ๋ฐ ํ™˜๊ฒฝ ์„ค์ • **Midm-2.0-Mini-Vision-Instruct**์„ ์‹œ์ž‘ํ•˜๋ ค๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ ˆํฌ์ง€ํ† ๋ฆฌ๋ฅผ ํด๋ก ํ•˜๊ณ  ํ™˜๊ฒฝ์„ ์„ค์ •ํ•˜์„ธ์š”. ๐Ÿ› ๏ธ 1. **๋ ˆํฌ์ง€ํ† ๋ฆฌ ํด๋ก **: ```bash git clone https://github.com/bigdefence/midm-vision cd midm-vision ``` 2. **์˜์กด์„ฑ ์„ค์น˜**: ```bash conda create -n midm-vision python=3.10 -y conda activate midm-vision pip install -e . pip install flash-attn==2.5.2 --no-build-isolation ``` ### ๐Ÿ“ฅ ๋‹ค์šด๋กœ๋“œ ๋ฐฉ๋ฒ• **Huggingface CLI ์‚ฌ์šฉ**: ```bash pip install -U huggingface_hub huggingface-cli download bigdefence/Midm-Vision --local-dir ./checkpoints ``` **Snapshot Download ์‚ฌ์šฉ**: ```bash pip install -U huggingface_hub ``` ```python from huggingface_hub import snapshot_download snapshot_download( repo_id="bigdefence/Midm-Vision", local_dir="./checkpoints", resume_download=True ) ``` **Git ์‚ฌ์šฉ**: ```bash git lfs install git clone https://huggingface.co/bigdefence/midm-vision ``` ### ๐Ÿ”„ ๋กœ์ปฌ ์ถ”๋ก  **Midm-Vision**์œผ๋กœ ์ถ”๋ก ์„ ์ˆ˜ํ–‰ํ•˜๋ ค๋ฉด ๋‹ค์Œ ๋‹จ๊ณ„๋ฅผ ๋”ฐ๋ผ ๋ชจ๋ธ์„ ์„ค์ •ํ•˜๊ณ  ๋กœ์ปฌ์—์„œ ์‹คํ–‰ํ•˜์„ธ์š”. ๐Ÿ“ก 1. **๋ชจ๋ธ ์ค€๋น„**: - [HuggingFace](https://huggingface.co/bigdefence/Midm-2.0-Mini-Vision-Instruct)์—์„œ **Midm-2.0-Mini-Vision-Instruct** ๋‹ค์šด๋กœ๋“œ ๐Ÿ“ฆ 2. **์ถ”๋ก  ์‹คํ–‰**: - **Streaming** ```bash python3 infer.py --model-path checkpoints --image-file test.jpg ``` ## ๐Ÿ”ง ํ›ˆ๋ จ ์„ธ๋ถ€์‚ฌํ•ญ ### ํ›ˆ๋ จ ์„ค์ • - **Base Model**: K-intelligence/Midm-2.0-Mini-Instruct - **Hardware**: 4x NVIDIA RTX 4090 GPU - **Training Time**: 10์‹œ๊ฐ„ ## ๐Ÿ“œ ๋ผ์ด์„ ์Šค ์ด ๋ชจ๋ธ์€ Apache 2.0 ๋ผ์ด์„ ์Šค ํ•˜์— ๋ฐฐํฌ๋ฉ๋‹ˆ๋‹ค. ์ƒ์—…์  ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋ฉฐ, ์ž์„ธํ•œ ๋‚ด์šฉ์€ [LICENSE](LICENSE) ํŒŒ์ผ์„ ์ฐธ์กฐํ•˜์„ธ์š”. ## ๐Ÿ“ž ๋ฌธ์˜์‚ฌํ•ญ - **๊ฐœ๋ฐœ**: BigDefence