kappatune-lora-tinyllama-agnews

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0 on the AG News dataset. It was trained to demonstrate the KappaTune fine-tuning methodology, which selectively adapts neural network layers based on their condition number to mitigate catastrophic forgetting.

Model description

This model is a LoRA (Low-Rank Adaptation) adapter applied to TinyLlama-1.1B. Unlike standard LoRA, which targets manually specified module types (e.g., all q_proj or v_proj layers), this adapter was trained using the KappaTune PEFT Integration. Before training, the KappaTune algorithm performed a Singular Value Decomposition (SVD) on all candidate weight matrices to calculate their Condition Number (kappa).

  • High-kappa tensors (highly specialized, anisotropic weights containing pre-trained knowledge) were frozen.
  • Low-kappa tensors (numerically stable, general-purpose weights of higher output entropy acting as a "raw marble block") were selected for LoRA adaptation.

This targeted approach ensures the model learns the new domain efficiently while preserving its foundational conversational capabilities.

Intended uses & limitations

  • Methodological Proof of Concept: This model serves primarily as a demonstration of the KappaTune automated layer-selection strategy.
  • Domain Adaptation: It illustrates how a general-purpose chat model can be steered toward a specific formatting and vocabulary style (News Reporting) using a tiny fraction of trainable parameters.

Training and evaluation data

The model was fine-tuned on the AG News dataset, a curated collection of news articles categorized into four topics: World, Sports, Business, and Sci/Tech.

Training procedure

The model was trained using the Hugging Face Trainer API combined with the kappaTune optimizer wrapper. Although AG News is traditionally a classification benchmark, the data was formatted as a continuous text stream for this experiment. The model was trained on a Causal Language Modeling task (next-token prediction) to adapt its generation style to news reporting, rather than predicting a categorical label.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Framework versions

  • PEFT 0.17.1
  • Transformers 4.55.4
  • Pytorch 2.6.0+cu124
  • Datasets 4.1.1
  • Tokenizers 0.21.4
Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for oswaldoludwig/kappatune-lora-tinyllama-agnews

Adapter
(1446)
this model

Datasets used to train oswaldoludwig/kappatune-lora-tinyllama-agnews

Paper for oswaldoludwig/kappatune-lora-tinyllama-agnews