PZ
philipp-zettl
AI & ML interests
NLP/CV/Multimodal learning
Recent Activity
upvoted a changelog about 22 hours ago
Introducing Kernels upvoted a collection 5 days ago
deployed-models repliedto their post 7 days ago
I've been cooking something neat over the past weeks 👨🍳
We all know that training LLMs requires a lot of resources and especially a lot of compute in form of GPUs, or is super slow and inefficient when done on CPUs.
The big players use giant clusters of Nvidia H100s.
But if I look at the profiles of my fellow home brewers, all we can get our hands on are those pesky consumer RTX's. If you're lucky you got yourself a 5080 with 16GB VRAM or something.
To be frank, I don't have that 1.3k disposable cash laying around ¯\_(ツ)_/¯
But I can write rust and like building ML libraries.
So I asked myself the question(s):
- can I train SMLs at home on my hardware?
- How hard can it be to build a ML library that can stream data between RAM and VRAM on demand, like llama.cpp's unified memory feature [^1]?
- how hard can it be to implement bf16 support?
The answers are wild, trust me!
Image 1: Metrics form last nights build on my "tiny" RTX 2060 (6 GB VRAM)
Image 2: Metrics from my most recent build on my RTX 4070 Laptop (8GB VRAM)
The majority of my time went into the shared memory, but it's stable and I'm very excited!
Here some debug logs, a la "trust me bro"
```
----
Currently available: 1112735744, attempting to reclaim: 1073741824
--- VRAM STATE [backward pass] ---
Driver Used: 6744 MB / 7805 MB
Data on GPU: 1641 MB
Grads on GPU: 3459 MB
CPU Offloaded: 18230 MB
---------------------------------
Currently available: 1079181312, attempting to reclaim: 1073741824
--- VRAM STATE [backward pass] ---
Driver Used: 6776 MB / 7805 MB
Data on GPU: 1561 MB
Grads on GPU: 3279 MB
CPU Offloaded: 18590 MB
-----------------------------
```
Final models get exported in `safetensors` format and are compatible with PyTorch and `transformers`, for accessibility.
- [^1]: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#unified-memoryOrganizations
DiffFT
MTG Embedding models
SO-Prep
Set of datasets that combined + extension will be used to create the biggest open Structured Output data set. **Note**: english only
LargeWurstModels
F(T5+1)
Fine Tuned T5 models, hence F of T5 + 1 ;-)
summarization
embedding-models
-
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
thenlper/gte-base
Sentence Similarity • 0.1B • Updated • 260k • • 131 -
Alibaba-NLP/gte-multilingual-reranker-base
Text Ranking • 0.3B • Updated • 109k • 176 -
BAAI/bge-m3
Sentence Similarity • Updated • 15.6M • • 2.91k
not closed TTS
NPC models
ImageNet(s)
Models trained for image classification using ImageNet
OCR
Diffusion Language Models
Experimental diffusion-style MLM built on top of ModernBERT. Inspired by https://nathan.rs/posts/roberta-diffusion/
RP
RAG STACK
VLMs
-
baidu/ERNIE-4.5-VL-28B-A3B-PT
Image-Text-to-Text • 29B • Updated • 57.7k • • 97 -
Qwen/Qwen2.5-VL-3B-Instruct
Image-Text-to-Text • 4B • Updated • 6.31M • 634 -
Qwen/Qwen2.5-VL-7B-Instruct
Image-Text-to-Text • 8B • Updated • 4.52M • • 1.49k -
Qwen/Qwen2-VL-2B-Instruct
Image-Text-to-Text • Updated • 2.45M • 498
Chess ♟️
Datasets, models and spaces related to chess
ToS'
Models and data sets connected to Terms of Services of webservices.
good-summaries
based on personal preferences
llamas
sd-1.5
collection of sd-1.5 fine tunes or LoRAs
secret sauce FLUX
BG-RM
OSS Background Removers
CV datasets
Diffusion Language Models
Experimental diffusion-style MLM built on top of ModernBERT. Inspired by https://nathan.rs/posts/roberta-diffusion/
DiffFT
RP
MTG Embedding models
RAG STACK
SO-Prep
Set of datasets that combined + extension will be used to create the biggest open Structured Output data set. **Note**: english only
VLMs
-
baidu/ERNIE-4.5-VL-28B-A3B-PT
Image-Text-to-Text • 29B • Updated • 57.7k • • 97 -
Qwen/Qwen2.5-VL-3B-Instruct
Image-Text-to-Text • 4B • Updated • 6.31M • 634 -
Qwen/Qwen2.5-VL-7B-Instruct
Image-Text-to-Text • 8B • Updated • 4.52M • • 1.49k -
Qwen/Qwen2-VL-2B-Instruct
Image-Text-to-Text • Updated • 2.45M • 498
LargeWurstModels
Chess ♟️
Datasets, models and spaces related to chess
F(T5+1)
Fine Tuned T5 models, hence F of T5 + 1 ;-)
ToS'
Models and data sets connected to Terms of Services of webservices.
summarization
good-summaries
based on personal preferences
embedding-models
-
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
thenlper/gte-base
Sentence Similarity • 0.1B • Updated • 260k • • 131 -
Alibaba-NLP/gte-multilingual-reranker-base
Text Ranking • 0.3B • Updated • 109k • 176 -
BAAI/bge-m3
Sentence Similarity • Updated • 15.6M • • 2.91k
llamas
not closed TTS
sd-1.5
collection of sd-1.5 fine tunes or LoRAs
NPC models
secret sauce FLUX
ImageNet(s)
Models trained for image classification using ImageNet
BG-RM
OSS Background Removers
OCR