specimba commited on
Commit
3f17e4c
Β·
verified Β·
1 Parent(s): f73414b

v4.0 README: 5 real providers, self-contained, zero heavy deps

Browse files
Files changed (1) hide show
  1. README.md +40 -42
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: NEXUS OS v2.1
3
  emoji: πŸ”₯
4
  colorFrom: red
5
  colorTo: purple
@@ -7,55 +7,53 @@ sdk: gradio
7
  sdk_version: 6.14.0
8
  app_file: app.py
9
  pinned: false
10
- tags:
11
- - ml-intern
12
  ---
13
 
14
- # NEXUS OS v2.1 β€” Real LLM Inference via HF API
15
 
16
- **Primary backend: HF Inference API** (free tier, works immediately)
17
-
18
- This Space provides GENUINE model inference without GPU access, ngrok tunnels, or paid cloud APIs.
19
 
20
  ## How It Works
21
 
22
- ### 1. HF Inference API (Primary β€” No Setup Needed)
23
- - Uses your HF token (already active in Spaces)
24
- - Free tier: $0.10/month credits (~100-500 requests)
25
- - Models: SmolLM2-1.7B, Llama-3.2-1B, Qwen2.5-0.5B, Gemma-2-2B, Phi-4-mini
26
- - Just enter a prompt and click Generate β€” real inference immediately
27
-
28
- ### 2. Ollama Relay (Optional β€” Your Local Models)
29
- - Expose your local Ollama: `ngrok http 11434`
30
- - Set `OLLAMA_RELAY_URL` in Space secrets
31
- - Access your 37+ local models through the Space
32
-
33
- ### 3. Cloud API Fallback (Optional β€” Paid Providers)
34
- - DeepSeek, Claude, GPT-5, Qwen, Kimi, GLM
35
- - Add API keys to Space secrets
36
- - Used when HF Inference API and Ollama are unavailable
37
-
38
- ### 4. Mock Mode (Last Resort)
39
- - Simulated responses with full telemetry
40
- - Useful for testing the UI without any backends
 
 
 
 
 
 
 
41
 
42
  ## Features
43
- - **37+ real models** in registry including Nemotron-3 Nano-Omni 30B and OpenSonnet-Lite-MAX
44
- - **4 hallucination detectors** (EPR, Spilled Energy, CK-PLUG, TWAVE)
45
- - **Novel composite signals**: EEP, PTI, NEWI
46
- - **Per-token thermodynamic telemetry** with risk scoring
47
- - **VRAM-aware model filtering** β€” only shows models that fit your budget
48
-
49
- ## Quick Start
50
- 1. Open the Space
51
- 2. Enter a prompt
52
- 3. Click "Generate with NEXUS OS"
53
- 4. Get real inference + thermodynamic risk analysis
54
 
55
  ## Repository
56
  [specimba/nexus-os-v2](https://huggingface.co/datasets/specimba/nexus-os-v2)
57
-
58
- ## Troubleshooting
59
- - **"HF Inference API unavailable"**: Your HF token may have exhausted free credits. The Space will fallback to mock mode.
60
- - **"Ollama relay unreachable"**: Check your ngrok tunnel is active and the URL is correct in Space secrets.
61
- - **"Cloud API failed"**: Ensure API keys are added as Space secrets (not hardcoded).
 
1
  ---
2
+ title: NEXUS OS v4.0
3
  emoji: πŸ”₯
4
  colorFrom: red
5
  colorTo: purple
 
7
  sdk_version: 6.14.0
8
  app_file: app.py
9
  pinned: false
 
 
10
  ---
11
 
12
+ # NEXUS OS v4.0 β€” Intelligent Multi-Provider Router
13
 
14
+ **COMPLETELY self-contained** β€” zero external dependencies except gradio + stdlib.
15
+ No torch, no pinecone, no package imports that crash on startup.
 
16
 
17
  ## How It Works
18
 
19
+ ### Intelligent Routing (Auto-Detected)
20
+ The app queries ALL configured providers in parallel, measures health + latency,
21
+ and picks the best one automatically. Falls back through the chain if any fail.
22
+
23
+ | Priority | Provider | Free Tier | Strength |
24
+ |----------|----------|-----------|----------|
25
+ | **1** | **HF Inference Providers** | $0.10/mo credits | Auto-routing, single HF token |
26
+ | **2** | **Groq** | Generous | Fastest inference (LPU chips) |
27
+ | **3** | **DeepSeek** | 5M tokens | Best reasoning models |
28
+ | **4** | **OpenRouter** | 25+ free models | Most model variety |
29
+ | **5** | **Together AI** | Rate-limited 70B | Large models, slow |
30
+ | **6** | **Ollama Relay** | Your local models | Via ngrok tunnel |
31
+ | **7** | **Mock** | Always works | Simulated for testing |
32
+
33
+ ### Setup
34
+
35
+ **No setup needed for mock mode.** To get real inference, add API keys as Space secrets:
36
+
37
+ | Secret | Provider | Get Key At |
38
+ |--------|----------|------------|
39
+ | `HF_TOKEN` | HF Inference Providers | Already active in Spaces |
40
+ | `GROQ_API_KEY` | Groq | https://console.groq.com |
41
+ | `DEEPSEEK_API_KEY` | DeepSeek | https://platform.deepseek.com |
42
+ | `OPENROUTER_API_KEY` | OpenRouter | https://openrouter.ai |
43
+ | `TOGETHER_API_KEY` | Together AI | https://api.together.xyz |
44
+ | `OLLAMA_RELAY_URL` | Your local Ollama | `ngrok http 11434` |
45
 
46
  ## Features
47
+ - **37+ real models** in registry
48
+ - **Thermodynamic telemetry**: EEP, PTI, NEWI hallucination signals
49
+ - **VRAM-aware filtering**: only shows models that fit your budget
50
+ - **Per-token risk scoring**: hallucination detection simulation
51
+
52
+ ## What's New in v4.0
53
+ - **Self-contained**: no `nexus_os_v2/` imports, no torch/pinecone dependencies
54
+ - **5 real providers**: HF Router, Groq, DeepSeek, OpenRouter, Together AI
55
+ - **Removed**: Kilocode (IDE plugin), OpenCode (IDE plugin), NVIDIA NIM (trial only), Fireworks ($1 credit)
56
+ - **Intelligent routing**: parallel health checks, capability-based model selection
 
57
 
58
  ## Repository
59
  [specimba/nexus-os-v2](https://huggingface.co/datasets/specimba/nexus-os-v2)