hungho77 commited on
Commit
881cbf2
Β·
verified Β·
1 Parent(s): 5850ecc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -97
README.md CHANGED
@@ -9,106 +9,21 @@ pinned: false
9
 
10
  # VRFAI β€” Edge AI & Model Optimization
11
 
12
- We focus on **making AI models actually run in the real world** β€” on robots, edge devices, and constrained systems.
13
 
14
- This space collects:
15
- - ⚑ Model optimization techniques (quantization, pruning, distillation)
16
- - πŸš€ Deployment pipelines (TensorRT, ONNX Runtime, OpenVINO, edge runtimes)
17
- - πŸ€– Robotics-oriented AI (Vision-Language-Action, real-time inference)
18
- - πŸ§ͺ Practical experiments and benchmarks (not just theory)
19
 
20
- ---
21
-
22
- ## πŸ”§ What we work on
23
-
24
- ### 1. Model Optimization (Model-side)
25
- - INT8 / INT4 / FP8 / NVFP4 quantization
26
- - Structured pruning (2:4 sparsity, token pruning)
27
- - Distillation & lightweight architectures
28
-
29
- β†’ Goal: **reduce compute + memory while preserving behavior**
30
-
31
- ---
32
-
33
- ### 2. Deployment & Runtime Optimization (System-side)
34
- - TensorRT / TensorRT-LLM (NVIDIA Jetson & GPU)
35
- - ONNX Runtime / OpenVINO / TVM
36
- - Edge SoC stacks (Qualcomm QNN, LiteRT, ExecuTorch)
37
-
38
- β†’ Goal: **turn models into fast, hardware-efficient engines**
39
-
40
- ---
41
-
42
- ### 3. Real-world Pipelines (Robotics focus)
43
- - Vision + Language + Action (VLA)
44
- - Multi-modal inference pipelines
45
- - Real-time constraints (latency, stability, safety)
46
-
47
- β†’ Goal: **make models usable in control loops, not just benchmarks**
48
-
49
- ---
50
-
51
- ## πŸ“Š What actually matters (our philosophy)
52
-
53
- Model optimization is not one trick β€” it’s a **full stack problem**:
54
-
55
- - Weights β†’ memory footprint
56
- - KV cache β†’ long-context bottleneck
57
- - Kernels β†’ real latency
58
- - Scheduling β†’ throughput & stability
59
-
60
- > The biggest wins usually come from **matching model compression to runtime kernels**, not just applying techniques blindly.
61
-
62
- ---
63
-
64
- ## πŸ§ͺ Example Work
65
-
66
- - βš™οΈ TensorRT optimization for VLA policies
67
- - πŸ“‰ INT8 / INT4 quantization with minimal behavior drift
68
- - πŸš€ End-to-end latency profiling (not just model FPS)
69
- - πŸ”¬ Benchmark pipelines with real-world constraints
70
-
71
- ---
72
-
73
- ## 🎯 Focus Areas
74
-
75
- - Edge AI deployment (Jetson, embedded, mobile SoCs)
76
- - Real-time inference systems
77
- - Efficient VLA / robotics models
78
- - Practical optimization over academic benchmarks
79
-
80
- ---
81
-
82
- ## ⚠️ Disclaimer
83
-
84
- This is a collection of:
85
- - experiments
86
- - working notes
87
- - partially-explored ideas
88
-
89
- Expect rough edges β€” but also **high-signal insights from real deployment work**.
90
-
91
- ---
92
-
93
- ## 🀝 Contributions
94
-
95
- If you're working on:
96
- - edge AI
97
- - model optimization
98
- - robotics ML
99
-
100
- Feel free to open issues or PRs.
101
-
102
- ---
103
-
104
- ## πŸ”— Related Work
105
-
106
- - Optimization notes: model-level + system-level pipelines
107
- - Deep dives: quantization, KV cache, serving engines
108
- - SoC landscape: non-CUDA deployment stacks
109
 
110
- (See repo for full notes and examples)
 
111
 
112
  ---
113
 
114
- **VRFAI β€” Making AI models fast, deployable, and real.**
 
9
 
10
  # VRFAI β€” Edge AI & Model Optimization
11
 
12
+ We optimize and deploy **LLMs, ASR, VLM and VLA (Vision-Language-Action) models** on real-world systems.
13
 
14
+ ## πŸ”§ What we do
15
+ - Optimization: quantization (INT8/INT4/FP8/NVFP4), pruning, distillation, ...
16
+ - Deployment: VLLM, TensorRT, ONNX Runtime, edge runtimes
17
+ - Systems: real-time pipelines (vision, audio, language, action)
 
18
 
19
+ ## 🎯 Focus
20
+ - Edge devices (Jetson, SoCs)
21
+ - Robotics & VLA systems
22
+ - Latency, stability, deployability
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
+ ## ⚑ Philosophy
25
+ Optimization = **model + runtime + system**
26
 
27
  ---
28
 
29
+ **VRFAI β€” making AI models fast, efficient, and real**