Chimera V3 — Qwen 1.5B with Neural Foam Growth

A fine-tuned Qwen2.5-1.5B-Instruct with custom tool use, identity, and autonomous reasoning capabilities, trained using the Neural Foam growth architecture that grows new neurons during training.

Key Results (Log-Likelihood Eval, n=200)

Capability	Chimera V3	Raw Qwen 1.5B	Delta
ARC-Easy	74.0%	77.5%	-3.5
ARC-Challenge	65.0%	70.0%	-5.0
Tool Use	10/10	0/10	+10
Identity (ATLES)	5/5	0/5	+5
Autonomous Reasoning	5/5	0/5	+5
Conversational	4/5	5/5	-1

Only 3.5% ARC-Easy drop while adding 4 new capability dimensions.

What It Does

Tool Use: Outputs structured tags like [Using Math Tool], [TASK ADDED], [NOTE SAVED] for math, task management, and note-taking
Identity: Knows it's ATLES, created by Connor
Autonomous Reasoning: Uses <thinking> tags and structured problem-solving for debugging/ops questions
Science Reasoning: Retains 95% of base Qwen's ARC science benchmark performance

Architecture: Neural Foam V3

Instead of standard fine-tuning (which causes catastrophic forgetting), Chimera V3 uses Neural Foam Growth:

Start from base Qwen2.5-1.5B-Instruct
Convert FFN layers to GrowableLinear — layers that can dynamically add neurons
Train on new capabilities (tool use, identity, autonomy)
Memory Replay Buffer replays ARC examples at 20% ratio to preserve reasoning
Growth decisions based on gradient magnitude — new neurons born where needed

The model grew +16 neurons across 2 FFN layers during training, with 5 dead neuron replacements (V3 recycling).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "spartan8806/chimera-v3-qwen-1.5b",
    torch_dtype=torch.bfloat16,
    device_map="cuda",
)
tokenizer = AutoTokenizer.from_pretrained("spartan8806/chimera-v3-qwen-1.5b")

messages = [{"role": "user", "content": "What is 456 + 789?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=100, do_sample=False)
print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
# Output: [Using Math Tool] 456 + 789 = 1,245

Training Details

Base: Qwen2.5-1.5B-Instruct
Method: V3 Neural Foam Growth (replacement ON, grow-only FFN)
Data: Tool use, identity, autonomy, conversational + 20% ARC replay
Optimizer: AdamW 8-bit (bitsandbytes)
Precision: bfloat16
Hardware: RTX 3060 12GB
Training Time: 13 minutes, 5 epochs
Final Loss: 0.1653
NaN Issues: 0

Eval Methodology

ARC scores use log-likelihood scoring (not generation-based), which is the standard for multiple-choice benchmarks. We compute avg log P(answer | question) for each choice and select the highest.

Generation-based eval (parsing model output for answer letters) drastically underestimates small model performance — Qwen 1.5B scores 27% with generation eval vs 77.5% with log-likelihood on the same ARC-Easy questions.

Limitations

1.5B parameter model — limited by base model capabilities
Tool use is format-based (structured tags), not actual tool execution
Grown neurons create non-standard layer dimensions (some FFN layers are 8961 or 8967 instead of 8960)

Citation

Part of the ATLES project by Connor. Neural Foam architecture for growing neural networks during training.

Downloads last month: 2

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for spartan8806/chimera-v3-qwen-1.5b

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Finetuned

(1504)

this model