Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
22.6
TFLOPS
3
Dominick Wirzba
Chronuid
Follow
0 followers
·
15 following
dominick-wirzba-a46898115
AI & ML interests
None yet
Recent Activity
reacted
to
philipp-zettl
's
post
with 👍
2 days ago
I've been cooking something neat over the past weeks 👨🍳 We all know that training LLMs requires a lot of resources and especially a lot of compute in form of GPUs, or is super slow and inefficient when done on CPUs. The big players use giant clusters of Nvidia H100s. But if I look at the profiles of my fellow home brewers, all we can get our hands on are those pesky consumer RTX's. If you're lucky you got yourself a 5080 with 16GB VRAM or something. To be frank, I don't have that 1.3k disposable cash laying around ¯\_(ツ)_/¯ But I can write rust and like building ML libraries. So I asked myself the question(s): - can I train SMLs at home on my hardware? - How hard can it be to build a ML library that can stream data between RAM and VRAM on demand, like llama.cpp's unified memory feature [^1]? - how hard can it be to implement bf16 support? The answers are wild, trust me! Image 1: Metrics form last nights build on my "tiny" RTX 2060 (6 GB VRAM) Image 2: Metrics from my most recent build on my RTX 4070 Laptop (8GB VRAM) The majority of my time went into the shared memory, but it's stable and I'm very excited! Here some debug logs, a la "trust me bro" ``` ---- Currently available: 1112735744, attempting to reclaim: 1073741824 --- VRAM STATE [backward pass] --- Driver Used: 6744 MB / 7805 MB Data on GPU: 1641 MB Grads on GPU: 3459 MB CPU Offloaded: 18230 MB --------------------------------- Currently available: 1079181312, attempting to reclaim: 1073741824 --- VRAM STATE [backward pass] --- Driver Used: 6776 MB / 7805 MB Data on GPU: 1561 MB Grads on GPU: 3279 MB CPU Offloaded: 18590 MB ----------------------------- ``` Final models get exported in `safetensors` format and are compatible with PyTorch and `transformers`, for accessibility. - [^1]: https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#unified-memory
reacted
to
sergiopaniego
's
post
with 🔥
about 1 month ago
Qwen3.5 dense (smol 🤏) models just dropped - natively multimodal - 0.8B · 2B · 4B · 9B (+ base variants) - 262K context extensible to 1M - built-in thinking fine-tune them with TRL out of the box → SFT, GRPO, DPO and more! examples: https://huggingface.co/docs/trl/example_overview collection: https://huggingface.co/collections/Qwen/qwen35
reacted
to
sergiopaniego
's
post
with 🔥
about 1 month ago
did you know you can train agentic models with RL deploying the environments on HF Spaces? 🤗 with TRL + OpenEnv, your training script connects to remote environments hosted as Spaces want to train faster? → just add more Spaces (TRL handles the parallelization natively) we used this to train a model to solve the trolley problem in CARLA. 2 HF Spaces running a full driving simulator, each on a T4 GPU full write-up with code and results → https://huggingface.co/blog/sergiopaniego/bringing-carla-to-openenv-trl
View all activity
Organizations
Chronuid
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a model
4 months ago
google/functiongemma-270m-it
Text Generation
•
Updated
Jan 14
•
37.3k
•
969
liked
2 models
11 months ago
OS-Copilot/OS-Atlas-Pro-7B
Image-Text-to-Text
•
8B
•
Updated
Nov 19, 2024
•
41
•
28
jinaai/jina-embeddings-v3
Feature Extraction
•
0.6B
•
Updated
6 days ago
•
2.54M
•
1.14k