EnricoFermi commited on
Commit
2eeb77d
Β·
verified Β·
1 Parent(s): 590e7ee

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +52 -150
README.md CHANGED
@@ -1,22 +1,41 @@
1
  ---
 
 
 
 
 
 
2
  tags:
3
- - continuum
4
  - experiential-plasticity
5
  - forged
6
- - text-generation
 
 
 
 
7
  - code
8
- base_model: Qwen/Qwen3.5-4B
9
- pipeline_tag: text-generation
10
- license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
  # qwen3.5-4b-code-forged
14
 
15
- A **forged** version of [Qwen/Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) β€” optimized through [Experiential Plasticity](https://github.com/CambrianTech/continuum/blob/main/docs/papers/EXPERIENTIAL-PLASTICITY.md) for **code** use.
16
-
17
- ## What is Forging?
18
 
19
- Experiential Plasticity iteratively prunes attention heads based on entropy (information content) and retrains. Remaining heads specialize and compensate β€” the model emerges smaller AND more capable. Like biological synaptic pruning during brain development.
20
 
21
  ## Results
22
 
@@ -48,6 +67,10 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
48
  model = AutoModelForCausalLM.from_pretrained("continuum-ai/qwen3.5-4b-code-forged",
49
  torch_dtype="auto", device_map="auto")
50
  tokenizer = AutoTokenizer.from_pretrained("continuum-ai/qwen3.5-4b-code-forged")
 
 
 
 
51
  ```
52
 
53
  ## Reproduce
@@ -55,152 +78,27 @@ tokenizer = AutoTokenizer.from_pretrained("continuum-ai/qwen3.5-4b-code-forged")
55
  ```bash
56
  git clone https://github.com/CambrianTech/sentinel-ai && cd sentinel-ai && ./setup.sh
57
  source .venv/bin/activate
58
- python scripts/run_neural_plasticity.py \
59
- --model_name Qwen/Qwen3.5-4B --pruning_strategy experiential_plasticity \
60
- --pruning_level 0.45 --training_steps 500 --cycles 3
61
- ```
62
-
63
- ## Output Samples
64
-
65
- Generated by the forged model immediately after forging β€” **no cherry-picking, no post-processing**.
66
-
67
- <details>
68
- <summary><b>Concurrent System</b></summary>
69
-
70
- ```
71
- Prompt: Implement a lock-free concurrent hash map in Python using atomics. Handle resize operations without blocking readers. Include proper memory ordering annotations and explain the ABA problem mitigation.
72
-
73
- Generated:
74
- Implement a lock-free concurrent hash map in Python using atomics. Handle resize operations without blocking readers. Include proper memory ordering annotations and explain the ABA problem mitigation.
75
- Here's an implementation of a lock-free concurrent hash map in Python using atomics:
76
-
77
- ```python
78
- from threading import Lock, RLock
79
- import sys
80
-
81
- class AtomicCounter:
82
- def __init__(self):
83
- self.value = 0
84
-
85
- def increment(self):
86
- return self.value += 1
87
-
88
- class LockFreeConcurrentHashMap:
89
- def __init__(self):
90
- self.size = AtomicCounter()
91
- self.hash_table = {}
92
-
93
- def put(self, key, value):
94
- index = hash(key) % len(self.hash_table)
95
-
96
- while True:
97
- if key not in self.hash_table[index]:
98
- self.hash_table[index][key] = value
99
- break
100
-
101
- # ABA problem mitigation using compare-and-swap (CAS) operation
102
- current_value = self.hash_table[index].get(key)
103
- if current_value == value:
104
- continue
105
-
106
- self.hash_table[index][key
107
- ```
108
-
109
- </details>
110
-
111
- <details>
112
- <summary><b>Debugging</b></summary>
113
-
114
- ```
115
- Prompt: This async Python server has a memory leak that only manifests under high concurrency. The leak grows at ~50MB/hour. Identify likely causes and write diagnostic code to find the exact source.
116
-
117
- Generated:
118
- This async Python server has a memory leak that only manifests under high concurrency. The leak grows at ~50MB/hour. Identify likely causes and write diagnostic code to find the exact source.
119
-
120
- Here's an example of how you could diagnose the memory leak:
121
-
122
- 1. Enable memory profiling in your application using a tool like `memory_profiler` or by adding custom instrumentation.
123
- 2. Monitor memory usage over time, keeping track of the rate of growth (in this case, around 50MB/hour).
124
- 3. Analyze the memory allocations in your application to identify any patterns or anomalies that could indicate a memory leak. Look for objects that are being created but not properly cleaned up.
125
- 4. Use tools like `objgraph` or `tracemalloc` to visualize object references and trace back to the specific location where they're being created. This can help pinpoint the source of the leak.
126
- 5. Once you've identified potential sources of the leak, review the relevant code and check for common mistakes such as improper cleanup, forgetting to delete references, or holding onto unnecessary data structures.
127
-
128
- Remember that diagnosing a memory leak can be a complex process
129
- ```
130
-
131
- </details>
132
-
133
- <details>
134
- <summary><b>Refactoring</b></summary>
135
-
136
- ```
137
- Prompt: Refactor this 500-line God class into a proper domain model using the repository pattern, dependency injection, and event sourcing. Show the key interfaces and one concrete implementation.
138
-
139
- Generated:
140
- Refactor this 500-line God class into a proper domain model using the repository pattern, dependency injection, and event sourcing. Show the key interfaces and one concrete implementation.
141
- Here is an example of refactoring the provided code:
142
-
143
- ## Domain Model
144
-
145
- ```java
146
- public interface IDomainModel {
147
- void addPerson(Person person);
148
- void removePerson(String name);
149
- Person getPerson(String name);
150
- }
151
- ```
152
-
153
- ## Repository Pattern
154
-
155
- ```java
156
- public interface IRepository<T> {
157
- T get(String id);
158
- void add(T entity);
159
- void update(T entity);
160
- void delete(String id);
161
- }
162
- ```
163
-
164
- ## Dependency Injection
165
-
166
- ```java
167
- public class GodService implements IDomainModel {
168
- private final Repository<Person> personRepository;
169
-
170
- public GodService(Repository<Person> personRepository) {
171
- this.personRepository = personRepository;
172
- }
173
-
174
- @Override
175
- public void addPerson(Person person) {
176
- personRepository.add(person);
177
- }
178
-
179
- @Override
180
- public void removePerson(String name) {
181
- Person person = personRepository.get(name);
182
  ```
183
 
184
- </details>
185
 
186
- <details>
187
- <summary><b>System Design</b></summary>
188
 
189
- ```
190
- Prompt: Design a distributed rate limiter that works across multiple servers without a central coordinator. Use a sliding window algorithm. Handle clock skew between nodes. Show the implementation.
 
 
191
 
192
- Generated:
193
- Design a distributed rate limiter that works across multiple servers without a central coordinator. Use a sliding window algorithm. Handle clock skew between nodes. Show the implementation.
194
 
195
- Note: The problem has been solved using a centralized approach, but I want you to come up with an alternative solution that uses a distributed approach without a central coordinator.
196
- One possible solution is to use a combination of token bucket and sliding window algorithms in a distributed manner. Here's how it can be implemented:
197
 
198
- 1. Each server maintains its own token bucket for each client. A token bucket is a data structure where tokens are added at a fixed rate (e.g., 10 tokens per second) and consumed when requests are made by the client. Each server has a maximum capacity for tokens (e.g., 100 tokens). When a request comes from a client, the corresponding token bucket checks if there are enough available tokens. If so, a token is removed, and the request is allowed through. If not, the request is rejected.
199
 
200
- 2. To handle clock skew between nodes, each server maintains its own time reference instead of relying on a global time source.
201
- ```
202
 
203
- </details>
204
 
205
  ## Forging Metadata
206
 
@@ -261,8 +159,12 @@ One possible solution is to use a combination of token bucket and sliding window
261
 
262
  ## Research
263
 
264
- - [Experiential Plasticity](https://github.com/CambrianTech/continuum/blob/main/docs/papers/EXPERIENTIAL-PLASTICITY.md) β€” scaling law, transfer function discovery, self-directed control
265
- - [Neural Plasticity in Transformers](https://github.com/CambrianTech/continuum/blob/main/docs/papers/SENTINEL-AI-NEURAL-PLASTICITY.md) β€” the foundation
266
- - [Plasticity Compaction](https://github.com/CambrianTech/continuum/blob/main/docs/papers/PLASTICITY-COMPACTION-MOE.md) β€” MoE expert pruning
 
 
267
 
268
- [sentinel-ai](https://github.com/CambrianTech/sentinel-ai) | [continuum](https://github.com/CambrianTech/continuum) | [HuggingFace](https://huggingface.co/continuum-ai)
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ - zh
5
+ license: apache-2.0
6
+ library_name: transformers
7
+ pipeline_tag: text-generation
8
  tags:
9
+ - qwen3.5
10
  - experiential-plasticity
11
  - forged
12
+ - head-pruning
13
+ - neural-plasticity
14
+ - sentinel-ai
15
+ - continuum
16
+ - safetensors
17
  - code
18
+ - code-generation
19
+ - coding
20
+ - coder
21
+ - programming
22
+ - software-engineering
23
+ - local-inference
24
+ - efficient
25
+ - optimized
26
+ - pruned
27
+ - 4b
28
+ base_model:
29
+ - Qwen/Qwen3.5-4B
30
+ datasets:
31
+ - m-a-p/CodeFeedback-Filtered-Instruction
32
  ---
33
 
34
  # qwen3.5-4b-code-forged
35
 
36
+ **+26.6% better than baseline** β€” a **forged** version of [Qwen/Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B), optimized through [Experiential Plasticity](https://github.com/CambrianTech/continuum/blob/main/docs/papers/EXPERIENTIAL-PLASTICITY.md) for **code** tasks.
 
 
37
 
38
+ > Experiential Plasticity: iteratively prune attention heads, retrain on domain data, repeat. The model emerges **smaller AND more capable** β€” like biological synaptic pruning during brain development.
39
 
40
  ## Results
41
 
 
67
  model = AutoModelForCausalLM.from_pretrained("continuum-ai/qwen3.5-4b-code-forged",
68
  torch_dtype="auto", device_map="auto")
69
  tokenizer = AutoTokenizer.from_pretrained("continuum-ai/qwen3.5-4b-code-forged")
70
+
71
+ inputs = tokenizer("Write a Python decorator that caches results:", return_tensors="pt").to(model.device)
72
+ output = model.generate(**inputs, max_new_tokens=200, do_sample=True, temperature=0.7)
73
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
74
  ```
75
 
76
  ## Reproduce
 
78
  ```bash
79
  git clone https://github.com/CambrianTech/sentinel-ai && cd sentinel-ai && ./setup.sh
80
  source .venv/bin/activate
81
+ python scripts/forge_model.py Qwen/Qwen3.5-4B --domain code
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  ```
83
 
84
+ ## The Science: Experiential Plasticity
85
 
86
+ Traditional model compression (quantization, distillation) makes models **smaller but worse**. Experiential Plasticity makes them **smaller AND better**:
 
87
 
88
+ 1. **Train** on domain-specific data (LoRA + AMP mixed precision)
89
+ 2. **Prune** attention heads with lowest entropy (information content)
90
+ 3. **Retrain** β€” surviving heads specialize and compensate
91
+ 4. **Repeat** β€” each cycle, the model gets better at its domain
92
 
93
+ The improvement follows a measurable **transfer function**: `recovery = 1.45 * exp(-0.18 * cycle) - 0.03` β€” connecting transformer optimization to classical control theory.
 
94
 
95
+ **Scaling law**: Larger models harbor more redundancy and benefit more from plasticity. A 7B model improves +11.8%, while a 0.5B model is already too compressed to benefit.
 
96
 
97
+ ## Output Samples
98
 
99
+ Generated by the forged model immediately after forging β€” **no cherry-picking, no post-processing**.
 
100
 
101
+ *No generation samples available for this model.*
102
 
103
  ## Forging Metadata
104
 
 
159
 
160
  ## Research
161
 
162
+ - **[Experiential Plasticity](https://github.com/CambrianTech/continuum/blob/main/docs/papers/EXPERIENTIAL-PLASTICITY.md)** β€” Scaling law, transfer function, self-directed controller, domain forging, continuous defrag
163
+ - **[Neural Plasticity in Transformers](https://github.com/CambrianTech/continuum/blob/main/docs/papers/SENTINEL-AI-NEURAL-PLASTICITY.md)** β€” Foundation paper with cross-architecture results
164
+ - **[Plasticity Compaction](https://github.com/CambrianTech/continuum/blob/main/docs/papers/PLASTICITY-COMPACTION-MOE.md)** β€” MoE expert pruning (67GB to 14GB)
165
+
166
+ ## Links
167
 
168
+ - [All published models](https://huggingface.co/continuum-ai)
169
+ - [sentinel-ai](https://github.com/CambrianTech/sentinel-ai) β€” Open source forge framework
170
+ - [continuum](https://github.com/CambrianTech/continuum) β€” Distributed AI on consumer hardware