Spaces:

vrfai
/

README

Running

App Files Files Community

hungho77 commited on 10 days ago

Commit

881cbf2

verified ·

1 Parent(s): 5850ecc

Update README.md

Browse files

Files changed (1) hide show

README.md +12 -97

README.md CHANGED Viewed

@@ -9,106 +9,21 @@ pinned: false
 # VRFAI — Edge AI & Model Optimization
-We focus on **making AI models actually run in the real world** — on robots, edge devices, and constrained systems.
-This space collects:
-- ⚡ Model optimization techniques (quantization, pruning, distillation)
-- 🚀 Deployment pipelines (TensorRT, ONNX Runtime, OpenVINO, edge runtimes)
-- 🤖 Robotics-oriented AI (Vision-Language-Action, real-time inference)
-- 🧪 Practical experiments and benchmarks (not just theory)
----
-## 🔧 What we work on
-### 1. Model Optimization (Model-side)
-- INT8 / INT4 / FP8 / NVFP4 quantization
-- Structured pruning (2:4 sparsity, token pruning)
-- Distillation & lightweight architectures
-→ Goal: **reduce compute + memory while preserving behavior**
----
-### 2. Deployment & Runtime Optimization (System-side)
-- TensorRT / TensorRT-LLM (NVIDIA Jetson & GPU)
-- ONNX Runtime / OpenVINO / TVM
-- Edge SoC stacks (Qualcomm QNN, LiteRT, ExecuTorch)
-→ Goal: **turn models into fast, hardware-efficient engines**
----
-### 3. Real-world Pipelines (Robotics focus)
-- Vision + Language + Action (VLA)
-- Multi-modal inference pipelines
-- Real-time constraints (latency, stability, safety)
-→ Goal: **make models usable in control loops, not just benchmarks**
----
-## 📊 What actually matters (our philosophy)
-Model optimization is not one trick — it’s a **full stack problem**:
-- Weights → memory footprint
-- KV cache → long-context bottleneck
-- Kernels → real latency
-- Scheduling → throughput & stability
-> The biggest wins usually come from **matching model compression to runtime kernels**, not just applying techniques blindly.
----
-## 🧪 Example Work
-- ⚙️ TensorRT optimization for VLA policies
-- 📉 INT8 / INT4 quantization with minimal behavior drift
-- 🚀 End-to-end latency profiling (not just model FPS)
-- 🔬 Benchmark pipelines with real-world constraints
----
-## 🎯 Focus Areas
-- Edge AI deployment (Jetson, embedded, mobile SoCs)
-- Real-time inference systems
-- Efficient VLA / robotics models
-- Practical optimization over academic benchmarks
----
-## ⚠️ Disclaimer
-This is a collection of:
-- experiments
-- working notes
-- partially-explored ideas
-Expect rough edges — but also **high-signal insights from real deployment work**.
----
-## 🤝 Contributions
-If you're working on:
-- edge AI
-- model optimization
-- robotics ML
-Feel free to open issues or PRs.
----
-## 🔗 Related Work
-- Optimization notes: model-level + system-level pipelines
-- Deep dives: quantization, KV cache, serving engines
-- SoC landscape: non-CUDA deployment stacks
-(See repo for full notes and examples)
 ---
-**VRFAI — Making AI models fast, deployable, and real.**

 # VRFAI — Edge AI & Model Optimization
+We optimize and deploy **LLMs, ASR, VLM and VLA (Vision-Language-Action) models** on real-world systems.
+## 🔧 What we do
+- Optimization: quantization (INT8/INT4/FP8/NVFP4), pruning, distillation, ...
+- Deployment: VLLM, TensorRT, ONNX Runtime, edge runtimes
+- Systems: real-time pipelines (vision, audio, language, action)
+## 🎯 Focus
+- Edge devices (Jetson, SoCs)
+- Robotics & VLA systems
+- Latency, stability, deployability
+## ⚡ Philosophy
+Optimization = **model + runtime + system**
 ---
+**VRFAI — making AI models fast, efficient, and real**