specimba commited on
Commit
42bc228
Β·
verified Β·
1 Parent(s): 8c867a4

Update README: HF Inference API is now primary backend

Browse files
Files changed (1) hide show
  1. README.md +39 -9
README.md CHANGED
@@ -11,21 +11,51 @@ tags:
11
  - ml-intern
12
  ---
13
 
14
- # NEXUS OS v2.1 β€” Thermodynamic LLM Control System
15
 
16
- Hybrid Cloud + Local Inference with BEC Thermodynamic Hallucination Control.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ## Features
19
- - **37+ real models** including Nemotron-3 Nano-Omni 30B and OpenSonnet-Lite-MAX
20
- - **Ollama relay** β€” connect your local Ollama via ngrok/tunnel
21
- - **Cloud API fallback** β€” DeepSeek, Claude, GPT-5, Qwen, Kimi, GLM
22
  - **4 hallucination detectors** (EPR, Spilled Energy, CK-PLUG, TWAVE)
23
  - **Novel composite signals**: EEP, PTI, NEWI
 
 
24
 
25
- ## Setup
26
- 1. Expose your local Ollama: `ngrok http 11434`
27
- 2. Set `OLLAMA_RELAY_URL` in Space secrets
28
- 3. Add cloud API keys as needed
 
29
 
30
  ## Repository
31
  [specimba/nexus-os-v2](https://huggingface.co/datasets/specimba/nexus-os-v2)
 
 
 
 
 
 
11
  - ml-intern
12
  ---
13
 
14
+ # NEXUS OS v2.1 β€” Real LLM Inference via HF API
15
 
16
+ **Primary backend: HF Inference API** (free tier, works immediately)
17
+
18
+ This Space provides GENUINE model inference without GPU access, ngrok tunnels, or paid cloud APIs.
19
+
20
+ ## How It Works
21
+
22
+ ### 1. HF Inference API (Primary β€” No Setup Needed)
23
+ - Uses your HF token (already active in Spaces)
24
+ - Free tier: $0.10/month credits (~100-500 requests)
25
+ - Models: SmolLM2-1.7B, Llama-3.2-1B, Qwen2.5-0.5B, Gemma-2-2B, Phi-4-mini
26
+ - Just enter a prompt and click Generate β€” real inference immediately
27
+
28
+ ### 2. Ollama Relay (Optional β€” Your Local Models)
29
+ - Expose your local Ollama: `ngrok http 11434`
30
+ - Set `OLLAMA_RELAY_URL` in Space secrets
31
+ - Access your 37+ local models through the Space
32
+
33
+ ### 3. Cloud API Fallback (Optional β€” Paid Providers)
34
+ - DeepSeek, Claude, GPT-5, Qwen, Kimi, GLM
35
+ - Add API keys to Space secrets
36
+ - Used when HF Inference API and Ollama are unavailable
37
+
38
+ ### 4. Mock Mode (Last Resort)
39
+ - Simulated responses with full telemetry
40
+ - Useful for testing the UI without any backends
41
 
42
  ## Features
43
+ - **37+ real models** in registry including Nemotron-3 Nano-Omni 30B and OpenSonnet-Lite-MAX
 
 
44
  - **4 hallucination detectors** (EPR, Spilled Energy, CK-PLUG, TWAVE)
45
  - **Novel composite signals**: EEP, PTI, NEWI
46
+ - **Per-token thermodynamic telemetry** with risk scoring
47
+ - **VRAM-aware model filtering** β€” only shows models that fit your budget
48
 
49
+ ## Quick Start
50
+ 1. Open the Space
51
+ 2. Enter a prompt
52
+ 3. Click "Generate with NEXUS OS"
53
+ 4. Get real inference + thermodynamic risk analysis
54
 
55
  ## Repository
56
  [specimba/nexus-os-v2](https://huggingface.co/datasets/specimba/nexus-os-v2)
57
+
58
+ ## Troubleshooting
59
+ - **"HF Inference API unavailable"**: Your HF token may have exhausted free credits. The Space will fallback to mock mode.
60
+ - **"Ollama relay unreachable"**: Check your ngrok tunnel is active and the URL is correct in Space secrets.
61
+ - **"Cloud API failed"**: Ensure API keys are added as Space secrets (not hardcoded).