Text Generation
MLX
Safetensors
GGUF
Rust
qwen3_5_text
4b
agentic-coding
android
apple-silicon
attested
bash
c
chain-of-custody
chinese
code
code-completion
code-generation
code-infill
coder
coding
consumer-gpu
cpp
cryptographically-verified
css
delta-forge
edge-inference
embedded
english
forge-alloy
function-calling
ggml
go
html
iphone
java
javascript
kotlin
llama-cpp
lm-studio
local-inference
macbook
mobile
multilingual
ollama
on-device
php
programming
python
q4-k-m
quantized
qwen
qwen3
qwen3.5
raspberry-pi
reproducible
ruby
software-engineering
sql
swift
typescript
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,43 +1,23 @@
|
|
| 1 |
---
|
| 2 |
-
language:
|
| 3 |
-
- en
|
| 4 |
-
- zh
|
| 5 |
-
license: apache-2.0
|
| 6 |
-
library_name: transformers
|
| 7 |
-
pipeline_tag: text-generation
|
| 8 |
tags:
|
| 9 |
-
-
|
| 10 |
- experiential-plasticity
|
| 11 |
- forged
|
| 12 |
-
-
|
| 13 |
-
-
|
| 14 |
-
- sentinel-ai
|
| 15 |
-
- continuum
|
| 16 |
-
- safetensors
|
| 17 |
- code
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
- programming
|
| 22 |
-
- software-engineering
|
| 23 |
-
- local-inference
|
| 24 |
-
- efficient
|
| 25 |
-
- optimized
|
| 26 |
-
- pruned
|
| 27 |
-
- 4b
|
| 28 |
-
base_model:
|
| 29 |
-
- Qwen/Qwen3.5-4B
|
| 30 |
-
datasets:
|
| 31 |
-
- m-a-p/CodeFeedback-Filtered-Instruction
|
| 32 |
---
|
| 33 |
|
| 34 |
# qwen3.5-4b-code-forged
|
| 35 |
|
| 36 |
-
|
| 37 |
|
| 38 |
-
|
| 39 |
|
| 40 |
-
|
| 41 |
|
| 42 |
## Results
|
| 43 |
|
|
@@ -45,45 +25,21 @@ The architecture co-evolves with training: heads that contribute to the domain s
|
|
| 45 |
|--------|-------|
|
| 46 |
| Base Model | [Qwen/Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) |
|
| 47 |
| Baseline Perplexity | 3.04 |
|
| 48 |
-
| **Forged Perplexity** | **2.
|
| 49 |
-
| **Improvement** | **+
|
| 50 |
| Domain | code |
|
| 51 |
-
| Training Data |
|
| 52 |
-
| Strategy |
|
| 53 |
-
| Pruning Level |
|
| 54 |
| Cycles | 3 |
|
| 55 |
-
| Steps/Cycle |
|
| 56 |
|
| 57 |
-
##
|
| 58 |
-
|
| 59 |
-
| Model | Size | HumanEval | HumanEval+ |
|
| 60 |
-
|-------|------|-----------|------------|
|
| 61 |
-
| StarCoder2-3B | 3B | 31.7% | — |
|
| 62 |
-
| Qwen2.5-Coder-3B | 3B | ~31% | — |
|
| 63 |
-
| Phi-2 | 2.7B | 47.6% | — |
|
| 64 |
-
| Qwen2.5-Coder-1.5B Q4_K_M | ~1GB | 51.8% | 48.2% |
|
| 65 |
-
| **qwen3.5-4b-code-forged** | **3.4B** | **57.3%** | **49.4%** |
|
| 66 |
-
| **qwen3.5-4b-code-forged Q4_K_M** | **2.6GB** | **53.0%** | **47.0%** |
|
| 67 |
-
|
| 68 |
-
**Beats Qwen2.5-Coder-1.5B** (purpose-built coder, ~1GB) at Q4_K_M: 53.0% vs 51.8%. **+20% above Phi-2, +82% above StarCoder2-3B** in the sub-5B class.
|
| 69 |
-
|
| 70 |
-
- **HumanEval**: 57.3% pass@1 (94/164 base problems)
|
| 71 |
-
- **HumanEval+**: 49.4% pass@1 (81/164 base + extra tests)
|
| 72 |
-
- **Method**: Greedy decoding (temperature 0), single sample, EvalPlus framework
|
| 73 |
-
- **Hardware**: Evaluated as fp16 HuggingFace transformers on RTX 5090
|
| 74 |
-
- **GGUF Q4_K_M**: 53.0% / 47.0% — only -4.3 points (7.5% relative drop from fp16)
|
| 75 |
-
- **GGUF evaluated via**: llama-cpp-python on RTX 5090
|
| 76 |
-
|
| 77 |
-
## Runs On
|
| 78 |
|
| 79 |
| Device | Format | Verified |
|
| 80 |
|--------|--------|----------|
|
| 81 |
| MacBook Pro 16GB | fp16 | Yes |
|
| 82 |
| MacBook Pro 32GB | fp16 | Yes |
|
| 83 |
-
| RTX 5090 | GGUF Q4_K_M | Yes (HumanEval 53.0%) |
|
| 84 |
-
| MacBook Pro M1 | GGUF Q4_K_M | Yes (llama.cpp Metal) |
|
| 85 |
-
|
| 86 |
-
These models are designed for **consumer hardware**. No A100s required. Your MacBook, your gaming PC, your home server.
|
| 87 |
|
| 88 |
## Quick Start
|
| 89 |
|
|
@@ -93,84 +49,151 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
| 93 |
model = AutoModelForCausalLM.from_pretrained("continuum-ai/qwen3.5-4b-code-forged",
|
| 94 |
torch_dtype="auto", device_map="auto")
|
| 95 |
tokenizer = AutoTokenizer.from_pretrained("continuum-ai/qwen3.5-4b-code-forged")
|
| 96 |
-
|
| 97 |
-
inputs = tokenizer("Write a Python decorator that caches results:", return_tensors="pt").to(model.device)
|
| 98 |
-
output = model.generate(**inputs, max_new_tokens=200, do_sample=True, temperature=0.7)
|
| 99 |
-
print(tokenizer.decode(output[0], skip_special_tokens=True))
|
| 100 |
```
|
| 101 |
|
| 102 |
-
##
|
| 103 |
-
|
| 104 |
-
Three commands. Any NVIDIA GPU with 8GB+ VRAM.
|
| 105 |
|
| 106 |
```bash
|
| 107 |
git clone https://github.com/CambrianTech/sentinel-ai && cd sentinel-ai && ./setup.sh
|
| 108 |
source .venv/bin/activate
|
| 109 |
-
python scripts/
|
|
|
|
|
|
|
| 110 |
```
|
| 111 |
|
| 112 |
-
|
| 113 |
|
| 114 |
-
|
| 115 |
|
| 116 |
-
|
|
|
|
| 117 |
|
| 118 |
-
|
|
|
|
| 119 |
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
3. **Prune** the lowest-contributing heads
|
| 123 |
-
4. **Retrain** on the same domain data — surviving heads specialize and compensate
|
| 124 |
-
5. **Defrag** — structurally remove dead heads, free VRAM
|
| 125 |
-
6. **Repeat** — each cycle the model improves on its domain
|
| 126 |
|
| 127 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 128 |
|
| 129 |
-
|
| 130 |
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
| Qwen2.5-0.5B | 0.5B | General | -3.2% (too small to prune) |
|
| 134 |
-
| Qwen2.5-1.5B | 1.5B | General | +3.0% |
|
| 135 |
-
| Qwen2.5-7B | 7.6B | General | +11.8% |
|
| 136 |
-
| **Qwen3.5-4B** | **3.4B** | **Code** | **+24.0%** |
|
| 137 |
-
| **Qwen3.5-27B** | **23.6B** | **Code** | **+3.5%** (4-bit, runs in 17GB) |
|
| 138 |
|
| 139 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
|
| 141 |
-
|
|
|
|
| 142 |
|
| 143 |
-
|
| 144 |
|
|
|
|
| 145 |
```
|
| 146 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
```
|
|
|
|
| 148 |
|
| 149 |
-
|
|
|
|
| 150 |
|
| 151 |
-
|
|
|
|
| 152 |
|
| 153 |
-
|
| 154 |
|
| 155 |
-
```
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 159 |
```
|
| 160 |
|
| 161 |
-
|
| 162 |
|
| 163 |
-
|
|
|
|
| 164 |
|
| 165 |
-
|
|
|
|
| 166 |
|
| 167 |
-
|
|
|
|
| 168 |
|
| 169 |
-
|
| 170 |
|
| 171 |
-
|
|
|
|
| 172 |
|
| 173 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 174 |
|
| 175 |
## Forging Metadata
|
| 176 |
|
|
@@ -178,42 +201,19 @@ Generated by the forged model immediately after forging — **no cherry-picking,
|
|
| 178 |
{
|
| 179 |
"model": "Qwen/Qwen3.5-4B",
|
| 180 |
"domain": "code",
|
| 181 |
-
"strategy": "experiential_plasticity",
|
| 182 |
-
"pruning_level": 0.45,
|
| 183 |
-
"cycles": 3,
|
| 184 |
-
"training_steps": 500,
|
| 185 |
"baseline_ppl": 3.0382,
|
| 186 |
-
"final_ppl": 2.
|
| 187 |
-
"improvement_pct":
|
| 188 |
-
"forged_at": "2026-03-
|
| 189 |
"device": "NVIDIA GeForce RTX 5090",
|
| 190 |
"tier": "A",
|
| 191 |
-
"
|
| 192 |
-
"
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
"seq_len": 256,
|
| 197 |
-
"cycle_results": [
|
| 198 |
-
{
|
| 199 |
-
"cycle": 1,
|
| 200 |
-
"post_prune_ppl": 2.2001,
|
| 201 |
-
"post_train_ppl": 2.2001,
|
| 202 |
-
"improvement_vs_baseline_pct": 27.59
|
| 203 |
-
},
|
| 204 |
-
{
|
| 205 |
-
"cycle": 2,
|
| 206 |
-
"post_prune_ppl": 2.2839,
|
| 207 |
-
"post_train_ppl": 2.2839,
|
| 208 |
-
"improvement_vs_baseline_pct": 24.83
|
| 209 |
-
},
|
| 210 |
-
{
|
| 211 |
-
"cycle": 3,
|
| 212 |
-
"post_prune_ppl": 2.2305,
|
| 213 |
-
"post_train_ppl": 2.2305,
|
| 214 |
-
"improvement_vs_baseline_pct": 26.58
|
| 215 |
-
}
|
| 216 |
],
|
|
|
|
| 217 |
"hardware_targets": [
|
| 218 |
{
|
| 219 |
"device": "MacBook Pro 16GB",
|
|
@@ -231,12 +231,10 @@ Generated by the forged model immediately after forging — **no cherry-picking,
|
|
| 231 |
|
| 232 |
## Research
|
| 233 |
|
| 234 |
-
-
|
| 235 |
-
-
|
| 236 |
-
-
|
| 237 |
|
| 238 |
-
|
| 239 |
|
| 240 |
-
|
| 241 |
-
- [sentinel-ai](https://github.com/CambrianTech/sentinel-ai) — Open source forge framework
|
| 242 |
-
- [continuum](https://github.com/CambrianTech/continuum) — Distributed AI on consumer hardware
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
tags:
|
| 3 |
+
- continuum
|
| 4 |
- experiential-plasticity
|
| 5 |
- forged
|
| 6 |
+
- forge-alloy
|
| 7 |
+
- text-generation
|
|
|
|
|
|
|
|
|
|
| 8 |
- code
|
| 9 |
+
base_model: Qwen/Qwen3.5-4B
|
| 10 |
+
pipeline_tag: text-generation
|
| 11 |
+
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
# qwen3.5-4b-code-forged
|
| 15 |
|
| 16 |
+
A **forged** version of [Qwen/Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) — optimized through [Experiential Plasticity](https://github.com/CambrianTech/continuum/blob/main/docs/papers/EXPERIENTIAL-PLASTICITY.md) for **code** use.
|
| 17 |
|
| 18 |
+
## What is Forging?
|
| 19 |
|
| 20 |
+
Experiential Plasticity iteratively prunes attention heads based on entropy (information content) and retrains. Remaining heads specialize and compensate — the model emerges smaller AND more capable. Like biological synaptic pruning during brain development.
|
| 21 |
|
| 22 |
## Results
|
| 23 |
|
|
|
|
| 25 |
|--------|-------|
|
| 26 |
| Base Model | [Qwen/Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) |
|
| 27 |
| Baseline Perplexity | 3.04 |
|
| 28 |
+
| **Forged Perplexity** | **2.35** |
|
| 29 |
+
| **Improvement** | **+22.7%** |
|
| 30 |
| Domain | code |
|
| 31 |
+
| Training Data | wikitext-2 |
|
| 32 |
+
| Strategy | combined |
|
| 33 |
+
| Pruning Level | 30% |
|
| 34 |
| Cycles | 3 |
|
| 35 |
+
| Steps/Cycle | 1000 |
|
| 36 |
|
| 37 |
+
## Target Hardware
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
| Device | Format | Verified |
|
| 40 |
|--------|--------|----------|
|
| 41 |
| MacBook Pro 16GB | fp16 | Yes |
|
| 42 |
| MacBook Pro 32GB | fp16 | Yes |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
## Quick Start
|
| 45 |
|
|
|
|
| 49 |
model = AutoModelForCausalLM.from_pretrained("continuum-ai/qwen3.5-4b-code-forged",
|
| 50 |
torch_dtype="auto", device_map="auto")
|
| 51 |
tokenizer = AutoTokenizer.from_pretrained("continuum-ai/qwen3.5-4b-code-forged")
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
```
|
| 53 |
|
| 54 |
+
## Reproduce
|
|
|
|
|
|
|
| 55 |
|
| 56 |
```bash
|
| 57 |
git clone https://github.com/CambrianTech/sentinel-ai && cd sentinel-ai && ./setup.sh
|
| 58 |
source .venv/bin/activate
|
| 59 |
+
python scripts/run_neural_plasticity.py \
|
| 60 |
+
--model_name Qwen/Qwen3.5-4B --pruning_strategy combined \
|
| 61 |
+
--pruning_level 0.3 --training_steps 1000 --cycles 3
|
| 62 |
```
|
| 63 |
|
| 64 |
+
## Output Samples
|
| 65 |
|
| 66 |
+
Generated by the forged model immediately after forging — **no cherry-picking, no post-processing**.
|
| 67 |
|
| 68 |
+
<details>
|
| 69 |
+
<summary><b>Concurrent System</b></summary>
|
| 70 |
|
| 71 |
+
```
|
| 72 |
+
Prompt: Implement a lock-free concurrent hash map in Python using atomics. Handle resize operations without blocking readers. Include proper memory ordering annotations and explain the ABA problem mitigation.
|
| 73 |
|
| 74 |
+
Generated:
|
| 75 |
+
Implement a lock-free concurrent hash map in Python using atomics. Handle resize operations without blocking readers. Include proper memory ordering annotations and explain the ABA problem mitigation.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
|
| 77 |
+
Reference erroneous code:
|
| 78 |
+
```python
|
| 79 |
+
import threading
|
| 80 |
+
|
| 81 |
+
class LockFreeHashMap:
|
| 82 |
+
def __init__(self):
|
| 83 |
+
self.hash_table = [None] * 10
|
| 84 |
+
self.size = 0
|
| 85 |
+
|
| 86 |
+
def insert(self, key, value):
|
| 87 |
+
index = hash(key) % len(self.hash_table)
|
| 88 |
+
while True:
|
| 89 |
+
old_value = self._get_value(index)
|
| 90 |
+
if old_value is None or old_value == value:
|
| 91 |
+
self.hash_table[index] = (key, value)
|
| 92 |
+
self.size += 1
|
| 93 |
+
return
|
| 94 |
+
elif old_value != value:
|
| 95 |
+
break
|
| 96 |
+
else:
|
| 97 |
+
self._compare_and_swap(index, (key, value))
|
| 98 |
+
|
| 99 |
+
def _get_value(self, index):
|
| 100 |
+
current = self.hash_table[index]
|
| 101 |
+
while True:
|
| 102 |
+
old_value = self._read_value(current)
|
| 103 |
+
new_value = self._read_value(old_value)
|
| 104 |
+
if new
|
| 105 |
+
```
|
| 106 |
|
| 107 |
+
</details>
|
| 108 |
|
| 109 |
+
<details>
|
| 110 |
+
<summary><b>Debugging</b></summary>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 111 |
|
| 112 |
+
```
|
| 113 |
+
Prompt: This async Python server has a memory leak that only manifests under high concurrency. The leak grows at ~50MB/hour. Identify likely causes and write diagnostic code to find the exact source.
|
| 114 |
+
|
| 115 |
+
Generated:
|
| 116 |
+
This async Python server has a memory leak that only manifests under high concurrency. The leak grows at ~50MB/hour. Identify likely causes and write diagnostic code to find the exact source.
|
| 117 |
+
|
| 118 |
+
Rewritten Test:
|
| 119 |
+
The following async Python server has a memory leak that only manifests under high concurrency. The leak grows at approximately 50MB per hour. Your task is to identify the likely cause(s) of this memory leak and provide diagnostic code to pinpoint the exact source.
|
| 120 |
|
| 121 |
+
Please note that you should not use any external libraries or frameworks for debugging, and you must implement your own solution using standard Python techniques. Additionally, please ensure that your code is optimized for performance and can handle high concurrency efficiently.
|
| 122 |
+
One possible cause of the memory leak could be improper handling of resources, such as unclosed file handles or sockets. Another potential cause could be excessive memory allocation without proper deallocation.
|
| 123 |
|
| 124 |
+
To diagnose the issue, we can start by monitoring the memory usage of the server over time. We can use tools like `tracemalloc` in Python to track memory allocations and identify where memory is being leaked. Here's an example code snippet to get started:
|
| 125 |
|
| 126 |
+
```python
|
| 127 |
```
|
| 128 |
+
|
| 129 |
+
</details>
|
| 130 |
+
|
| 131 |
+
<details>
|
| 132 |
+
<summary><b>Refactoring</b></summary>
|
| 133 |
+
|
| 134 |
```
|
| 135 |
+
Prompt: Refactor this 500-line God class into a proper domain model using the repository pattern, dependency injection, and event sourcing. Show the key interfaces and one concrete implementation.
|
| 136 |
|
| 137 |
+
Generated:
|
| 138 |
+
Refactor this 500-line God class into a proper domain model using the repository pattern, dependency injection, and event sourcing. Show the key interfaces and one concrete implementation. Additionally, implement a unit test to verify that the refactored code works correctly.
|
| 139 |
|
| 140 |
+
Furthermore, ensure that the refactored code is thread-safe by implementing proper synchronization mechanisms.
|
| 141 |
+
Here's an example of how you can refactor the given God class into a proper domain model using the repository pattern, dependency injection, and event sourcing:
|
| 142 |
|
| 143 |
+
Domain Model:
|
| 144 |
|
| 145 |
+
```csharp
|
| 146 |
+
public interface IGodRepository {
|
| 147 |
+
Task<God> GetAsync(Guid id);
|
| 148 |
+
Task AddAsync(God god);
|
| 149 |
+
}
|
| 150 |
+
|
| 151 |
+
public interface IEventSourcedRepository {
|
| 152 |
+
Task<God> GetAsync(Guid id);
|
| 153 |
+
Task AddAsync(God god);
|
| 154 |
+
Task SaveChangesAsync();
|
| 155 |
+
}
|
| 156 |
+
|
| 157 |
+
public interface IDependencyInjection {
|
| 158 |
+
IGodRepository GetGodRepository();
|
| 159 |
+
}
|
| 160 |
+
|
| 161 |
+
public class God {
|
| 162 |
+
public Guid Id { get; set; }
|
| 163 |
+
public string Name { get; set; }
|
| 164 |
+
public List<Event> Events { get; set; }
|
| 165 |
```
|
| 166 |
|
| 167 |
+
</details>
|
| 168 |
|
| 169 |
+
<details>
|
| 170 |
+
<summary><b>System Design</b></summary>
|
| 171 |
|
| 172 |
+
```
|
| 173 |
+
Prompt: Design a distributed rate limiter that works across multiple servers without a central coordinator. Use a sliding window algorithm. Handle clock skew between nodes. Show the implementation.
|
| 174 |
|
| 175 |
+
Generated:
|
| 176 |
+
Design a distributed rate limiter that works across multiple servers without a central coordinator. Use a sliding window algorithm. Handle clock skew between nodes. Show the implementation.
|
| 177 |
|
| 178 |
+
In addition to the above requirements, implement a feature that allows for burst traffic handling. The system should be able to handle short bursts of requests exceeding the average request rate within a given time window. This means implementing a token bucket algorithm in conjunction with the sliding window algorithm.
|
| 179 |
|
| 180 |
+
Furthermore, optimize the implementation to minimize memory usage and improve performance. Instead of maintaining separate counters for each server, utilize a shared data structure to track request counts across all servers efficiently.
|
| 181 |
+
Here's an implementation of a distributed rate limiter using a sliding window algorithm with burst traffic handling:
|
| 182 |
|
| 183 |
+
```python
|
| 184 |
+
import threading
|
| 185 |
+
from collections import defaultdict
|
| 186 |
+
|
| 187 |
+
class DistributedRateLimiter:
|
| 188 |
+
def __init__(self, max_requests_per_window=10):
|
| 189 |
+
self.max_requests_per_window = max_requests_per_window
|
| 190 |
+
self.sliding_window = defaultdict(list) # Keyed by IP address or any other identifier
|
| 191 |
+
self.token_bucket = {} # Keyed by IP address or any other identifier
|
| 192 |
+
|
| 193 |
+
def
|
| 194 |
+
```
|
| 195 |
+
|
| 196 |
+
</details>
|
| 197 |
|
| 198 |
## Forging Metadata
|
| 199 |
|
|
|
|
| 201 |
{
|
| 202 |
"model": "Qwen/Qwen3.5-4B",
|
| 203 |
"domain": "code",
|
|
|
|
|
|
|
|
|
|
|
|
|
| 204 |
"baseline_ppl": 3.0382,
|
| 205 |
+
"final_ppl": 2.3487,
|
| 206 |
+
"improvement_pct": 22.7,
|
| 207 |
+
"forged_at": "2026-03-31T12:13:43-0500",
|
| 208 |
"device": "NVIDIA GeForce RTX 5090",
|
| 209 |
"tier": "A",
|
| 210 |
+
"cycles": 3,
|
| 211 |
+
"stages": [
|
| 212 |
+
"train",
|
| 213 |
+
"quant",
|
| 214 |
+
"eval"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 215 |
],
|
| 216 |
+
"training_data": "wikitext-2",
|
| 217 |
"hardware_targets": [
|
| 218 |
{
|
| 219 |
"device": "MacBook Pro 16GB",
|
|
|
|
| 231 |
|
| 232 |
## Research
|
| 233 |
|
| 234 |
+
- [Experiential Plasticity](https://github.com/CambrianTech/continuum/blob/main/docs/papers/EXPERIENTIAL-PLASTICITY.md) — scaling law, transfer function discovery, self-directed control
|
| 235 |
+
- [Neural Plasticity in Transformers](https://github.com/CambrianTech/continuum/blob/main/docs/papers/SENTINEL-AI-NEURAL-PLASTICITY.md) — the foundation
|
| 236 |
+
- [Plasticity Compaction](https://github.com/CambrianTech/continuum/blob/main/docs/papers/PLASTICITY-COMPACTION-MOE.md) — MoE expert pruning
|
| 237 |
|
| 238 |
+
[sentinel-ai](https://github.com/CambrianTech/sentinel-ai) | [continuum](https://github.com/CambrianTech/continuum) | [forge-alloy](https://github.com/CambrianTech/forge-alloy) | [HuggingFace](https://huggingface.co/continuum-ai)
|
| 239 |
|
| 240 |
+
*Forged with [ForgeAlloy](https://github.com/CambrianTech/forge-alloy) — Trustless AI Compute Contract*
|
|
|
|
|
|