kappatune-lora-tinyllama-agnews

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0 on the AG News dataset. It was trained to demonstrate the KappaTune fine-tuning methodology, which selectively adapts neural network layers based on their condition number to mitigate catastrophic forgetting.

Model description

This model is a LoRA (Low-Rank Adaptation) adapter applied to TinyLlama-1.1B. Unlike standard LoRA, which targets manually specified module types (e.g., all q_proj or v_proj layers), this adapter was trained using the KappaTune PEFT Integration. Before training, the KappaTune algorithm performed a Singular Value Decomposition (SVD) on all candidate weight matrices to calculate their Condition Number (kappa).

High-kappa tensors (highly specialized, anisotropic weights containing pre-trained knowledge) were frozen.
Low-kappa tensors (numerically stable, general-purpose weights of higher output entropy acting as a "raw marble block") were selected for LoRA adaptation.

This targeted approach ensures the model learns the new domain efficiently while preserving its foundational conversational capabilities.

Paper: The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural Units (arXiv:2506.16289)
Repository: oswaldoludwig/kappaTune

Intended uses & limitations

Methodological Proof of Concept: This model serves primarily as a demonstration of the KappaTune automated layer-selection strategy.
Domain Adaptation: It illustrates how a general-purpose chat model can be steered toward a specific formatting and vocabulary style (News Reporting) using a tiny fraction of trainable parameters.

Training and evaluation data

The model was fine-tuned on the AG News dataset, a curated collection of news articles categorized into four topics: World, Sports, Business, and Sci/Tech.

Training procedure

The model was trained using the Hugging Face Trainer API combined with the kappaTune optimizer wrapper. Although AG News is traditionally a classification benchmark, the data was formatted as a continuous text stream for this experiment. The model was trained on a Causal Language Modeling task (next-token prediction) to adapt its generation style to news reporting, rather than predicting a categorical label.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 8
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 3

Training results

Framework versions

PEFT 0.17.1
Transformers 4.55.4
Pytorch 2.6.0+cu124
Datasets 4.1.1
Tokenizers 0.21.4

Downloads last month: 9

Model tree for oswaldoludwig/kappatune-lora-tinyllama-agnews

Base model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Adapter

(1446)

this model

Datasets used to train oswaldoludwig/kappatune-lora-tinyllama-agnews

Paper for oswaldoludwig/kappatune-lora-tinyllama-agnews

The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural Units

Paper • 2506.16289 • Published Jun 19, 2025