Chimera V3 β Qwen 1.5B with Neural Foam Growth
A fine-tuned Qwen2.5-1.5B-Instruct with custom tool use, identity, and autonomous reasoning capabilities, trained using the Neural Foam growth architecture that grows new neurons during training.
Key Results (Log-Likelihood Eval, n=200)
| Capability | Chimera V3 | Raw Qwen 1.5B | Delta |
|---|---|---|---|
| ARC-Easy | 74.0% | 77.5% | -3.5 |
| ARC-Challenge | 65.0% | 70.0% | -5.0 |
| Tool Use | 10/10 | 0/10 | +10 |
| Identity (ATLES) | 5/5 | 0/5 | +5 |
| Autonomous Reasoning | 5/5 | 0/5 | +5 |
| Conversational | 4/5 | 5/5 | -1 |
Only 3.5% ARC-Easy drop while adding 4 new capability dimensions.
What It Does
- Tool Use: Outputs structured tags like
[Using Math Tool],[TASK ADDED],[NOTE SAVED]for math, task management, and note-taking - Identity: Knows it's ATLES, created by Connor
- Autonomous Reasoning: Uses
<thinking>tags and structured problem-solving for debugging/ops questions - Science Reasoning: Retains 95% of base Qwen's ARC science benchmark performance
Architecture: Neural Foam V3
Instead of standard fine-tuning (which causes catastrophic forgetting), Chimera V3 uses Neural Foam Growth:
- Start from base Qwen2.5-1.5B-Instruct
- Convert FFN layers to GrowableLinear β layers that can dynamically add neurons
- Train on new capabilities (tool use, identity, autonomy)
- Memory Replay Buffer replays ARC examples at 20% ratio to preserve reasoning
- Growth decisions based on gradient magnitude β new neurons born where needed
The model grew +16 neurons across 2 FFN layers during training, with 5 dead neuron replacements (V3 recycling).
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"spartan8806/chimera-v3-qwen-1.5b",
torch_dtype=torch.bfloat16,
device_map="cuda",
)
tokenizer = AutoTokenizer.from_pretrained("spartan8806/chimera-v3-qwen-1.5b")
messages = [{"role": "user", "content": "What is 456 + 789?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=100, do_sample=False)
print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
# Output: [Using Math Tool] 456 + 789 = 1,245
Training Details
- Base: Qwen2.5-1.5B-Instruct
- Method: V3 Neural Foam Growth (replacement ON, grow-only FFN)
- Data: Tool use, identity, autonomy, conversational + 20% ARC replay
- Optimizer: AdamW 8-bit (bitsandbytes)
- Precision: bfloat16
- Hardware: RTX 3060 12GB
- Training Time: 13 minutes, 5 epochs
- Final Loss: 0.1653
- NaN Issues: 0
Eval Methodology
ARC scores use log-likelihood scoring (not generation-based), which is the standard for multiple-choice benchmarks. We compute avg log P(answer | question) for each choice and select the highest.
Generation-based eval (parsing model output for answer letters) drastically underestimates small model performance β Qwen 1.5B scores 27% with generation eval vs 77.5% with log-likelihood on the same ARC-Easy questions.
Limitations
- 1.5B parameter model β limited by base model capabilities
- Tool use is format-based (structured tags), not actual tool execution
- Grown neurons create non-standard layer dimensions (some FFN layers are 8961 or 8967 instead of 8960)
Citation
Part of the ATLES project by Connor. Neural Foam architecture for growing neural networks during training.
- Downloads last month
- 2