kappatune-lora-tinyllama-agnews
This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0 on the AG News dataset. It was trained to demonstrate the KappaTune fine-tuning methodology, which selectively adapts neural network layers based on their condition number to mitigate catastrophic forgetting.
Model description
This model is a LoRA (Low-Rank Adaptation) adapter applied to TinyLlama-1.1B. Unlike standard LoRA, which targets manually specified module types (e.g., all q_proj or v_proj layers), this adapter was trained using the KappaTune PEFT Integration.
Before training, the KappaTune algorithm performed a Singular Value Decomposition (SVD) on all candidate weight matrices to calculate their Condition Number (kappa).
- High-kappa tensors (highly specialized, anisotropic weights containing pre-trained knowledge) were frozen.
- Low-kappa tensors (numerically stable, general-purpose weights of higher output entropy acting as a "raw marble block") were selected for LoRA adaptation.
This targeted approach ensures the model learns the new domain efficiently while preserving its foundational conversational capabilities.
- Paper: The Condition Number as a Scale-Invariant Proxy for Information Encoding in Neural Units (arXiv:2506.16289)
- Repository: oswaldoludwig/kappaTune
Intended uses & limitations
- Methodological Proof of Concept: This model serves primarily as a demonstration of the KappaTune automated layer-selection strategy.
- Domain Adaptation: It illustrates how a general-purpose chat model can be steered toward a specific formatting and vocabulary style (News Reporting) using a tiny fraction of trainable parameters.
Training and evaluation data
The model was fine-tuned on the AG News dataset, a curated collection of news articles categorized into four topics: World, Sports, Business, and Sci/Tech.
Training procedure
The model was trained using the Hugging Face Trainer API combined with the kappaTune optimizer wrapper. Although AG News is traditionally a classification benchmark, the data was formatted as a continuous text stream for this experiment. The model was trained on a Causal Language Modeling task (next-token prediction) to adapt its generation style to news reporting, rather than predicting a categorical label.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 3
Training results
Framework versions
- PEFT 0.17.1
- Transformers 4.55.4
- Pytorch 2.6.0+cu124
- Datasets 4.1.1
- Tokenizers 0.21.4
- Downloads last month
- 9
Model tree for oswaldoludwig/kappatune-lora-tinyllama-agnews
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0