Chimera V3 β€” Qwen 1.5B with Neural Foam Growth

A fine-tuned Qwen2.5-1.5B-Instruct with custom tool use, identity, and autonomous reasoning capabilities, trained using the Neural Foam growth architecture that grows new neurons during training.

Key Results (Log-Likelihood Eval, n=200)

Capability Chimera V3 Raw Qwen 1.5B Delta
ARC-Easy 74.0% 77.5% -3.5
ARC-Challenge 65.0% 70.0% -5.0
Tool Use 10/10 0/10 +10
Identity (ATLES) 5/5 0/5 +5
Autonomous Reasoning 5/5 0/5 +5
Conversational 4/5 5/5 -1

Only 3.5% ARC-Easy drop while adding 4 new capability dimensions.

What It Does

  • Tool Use: Outputs structured tags like [Using Math Tool], [TASK ADDED], [NOTE SAVED] for math, task management, and note-taking
  • Identity: Knows it's ATLES, created by Connor
  • Autonomous Reasoning: Uses <thinking> tags and structured problem-solving for debugging/ops questions
  • Science Reasoning: Retains 95% of base Qwen's ARC science benchmark performance

Architecture: Neural Foam V3

Instead of standard fine-tuning (which causes catastrophic forgetting), Chimera V3 uses Neural Foam Growth:

  1. Start from base Qwen2.5-1.5B-Instruct
  2. Convert FFN layers to GrowableLinear β€” layers that can dynamically add neurons
  3. Train on new capabilities (tool use, identity, autonomy)
  4. Memory Replay Buffer replays ARC examples at 20% ratio to preserve reasoning
  5. Growth decisions based on gradient magnitude β€” new neurons born where needed

The model grew +16 neurons across 2 FFN layers during training, with 5 dead neuron replacements (V3 recycling).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "spartan8806/chimera-v3-qwen-1.5b",
    torch_dtype=torch.bfloat16,
    device_map="cuda",
)
tokenizer = AutoTokenizer.from_pretrained("spartan8806/chimera-v3-qwen-1.5b")

messages = [{"role": "user", "content": "What is 456 + 789?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=100, do_sample=False)
print(tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
# Output: [Using Math Tool] 456 + 789 = 1,245

Training Details

  • Base: Qwen2.5-1.5B-Instruct
  • Method: V3 Neural Foam Growth (replacement ON, grow-only FFN)
  • Data: Tool use, identity, autonomy, conversational + 20% ARC replay
  • Optimizer: AdamW 8-bit (bitsandbytes)
  • Precision: bfloat16
  • Hardware: RTX 3060 12GB
  • Training Time: 13 minutes, 5 epochs
  • Final Loss: 0.1653
  • NaN Issues: 0

Eval Methodology

ARC scores use log-likelihood scoring (not generation-based), which is the standard for multiple-choice benchmarks. We compute avg log P(answer | question) for each choice and select the highest.

Generation-based eval (parsing model output for answer letters) drastically underestimates small model performance β€” Qwen 1.5B scores 27% with generation eval vs 77.5% with log-likelihood on the same ARC-Easy questions.

Limitations

  • 1.5B parameter model β€” limited by base model capabilities
  • Tool use is format-based (structured tags), not actual tool execution
  • Grown neurons create non-standard layer dimensions (some FFN layers are 8961 or 8967 instead of 8960)

Citation

Part of the ATLES project by Connor. Neural Foam architecture for growing neural networks during training.

Downloads last month
2
Safetensors
Model size
2B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for spartan8806/chimera-v3-qwen-1.5b

Finetuned
(1504)
this model