--- license: apache-2.0 language: - en - zh - es - ur tags: - lora - aya - tiny-aya - multilingual - code - legesher - tiny-aya-expedition - language-decoded - unsloth library_name: transformers base_model: - CohereLabs/tiny-aya-base pipeline_tag: text-generation --- # Language Decoded LoRA QLoRA adapters fine-tuned on multilingual code conditions for the **Language Decoded** project (part of [Cohere's Tiny Aya Expedition](https://aya.for.ai)). ## Research Question > Does fine-tuning on non-English code improve multilingual reasoning — and is the benefit language-dependent or structure-dependent? ## Base Model All adapters are trained on [CohereLabs/tiny-aya-base](https://huggingface.co/CohereLabs/tiny-aya-base) (3.35B parameters). ## Model Structure This repo is the canonical hub for all Language Decoded LoRA adapters, organized by experimental condition: | Subdirectory | Condition | Training Data | | --------------------- | ----------- | ----------------------------------------------------- | | `condition-1-en-32k/` | Condition 1 | English Python from The Stack Dedup (full 32k corpus) | | `condition-1-en-5k/` | Condition 1 | English Python from The Stack Dedup (5k subset) | | `condition-2-zh-5k/` | Condition 2 | Chinese keyword-swapped Python (Legesher-transpiled) | | `condition-2-es-5k/` | Condition 2 | Spanish keyword-swapped Python (Legesher-transpiled) | | `condition-2-ur-5k/` | Condition 2 | Urdu keyword-swapped Python (Legesher-transpiled) | | `condition-3-zh-5k/` | Condition 3 | Transpiled + native Chinese code (blended) | ### The Experimental Ladder - **Baseline --> 1**: Does code help at all? - **1 --> 2**: Does the language of keywords matter? - **2 --> 3**: Does diversity of native-language sources add value beyond keyword swap? - **3 --> 4**: Does code written in the cultural context of a language carry unique signal? ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel # Load base model base_model = AutoModelForCausalLM.from_pretrained("CohereLabs/tiny-aya-base") tokenizer = AutoTokenizer.from_pretrained("CohereLabs/tiny-aya-base") # Load a LoRA adapter (e.g., Condition 1 — English code) model = PeftModel.from_pretrained(base_model, "legesher/language-decoded-lora", subfolder="condition-1-en-5k") # Load a language-specific adapter (e.g., Condition 2 — Chinese keyword-swapped) model = PeftModel.from_pretrained(base_model, "legesher/language-decoded-lora", subfolder="condition-2-zh-5k") ``` ## Training Details | Parameter | Value | | ------------------ | ------------------------------------------------------------------------------------------------ | | Base model | [CohereLabs/tiny-aya-base](https://huggingface.co/CohereLabs/tiny-aya-base) (3.35B params) | | Method | QLoRA 4-bit (NF4), ~5.4GB VRAM | | Hardware | Kaggle T4 (16GB) | | Tokenizer | CohereLabs/tiny-aya-base | | Transpilation tool | [Legesher](https://github.com/legesher/legesher) v0.7.3 | | Training data | [legesher/language-decoded-data](https://huggingface.co/datasets/legesher/language-decoded-data) | ### QLoRA Hyperparameters | Parameter | Value | | --------------- | ------------------------------------------------------------- | | LoRA rank (`r`) | 16 | | LoRA alpha | 32 | | LoRA dropout | 0.0 | | Target modules | q_proj, k_proj, v_proj, o_proj, up_proj, down_proj, gate_proj | | Bias | none | | Task type | CAUSAL_LM | | PEFT version | 0.18.1 | | Quantization | NF4 (4-bit) via Unsloth | ## Evaluation Models are evaluated on multilingual reasoning benchmarks with dual prompts (English + language-specific): | Benchmark | What it measures | Examples per language | | --------- | -------------------------- | --------------------- | | MGSM | Math reasoning | 250 (full set) | | X-CSQA | Commonsense reasoning | ~1,000 (full set) | | XNLI | Natural language inference | ~5,000 (full set) | _Results will be added as evaluation completes._ ## Limitations - **Single base model**: All adapters are trained on CohereLabs/tiny-aya-base (3.35B params). Results may not generalize to larger or architecturally different models. - **Limited training data**: Each condition uses a 5k-file subset for QLoRA fine-tuning, constrained by Kaggle T4 hardware limits. - **Evaluation scope**: Currently evaluated on 3 benchmarks (MGSM, X-CSQA, XNLI). Other reasoning tasks may show different patterns. - **Consumer hardware**: Training on Kaggle T4 (16GB) with 4-bit quantization introduces approximation that may affect adapter quality compared to full-precision training. ## Related Resources - **Training data**: [legesher/language-decoded-data](https://huggingface.co/datasets/legesher/language-decoded-data) - **Community code**: [legesher/language-decoded-community](https://huggingface.co/datasets/legesher/language-decoded-community) - **Experiment tracking**: [legesher/language-decoded-experiments](https://huggingface.co/datasets/legesher/language-decoded-experiments) - **Transpilation tool**: [Legesher on GitHub](https://github.com/legesher/legesher) ## Citation ```bibtex @misc{language-decoded-2026, title={Language Decoded: Investigating Language-Dependent vs. Structure-Dependent Reasoning Benefits of Code}, author={Madison Edgar and Saad Ahmed Bazaz and Tom Sherborne and Rashik Shahjahan and Khojasteh Mirza and Sarah Jawaid and Rafay Mustafa and Sohaib Ahmed Bazaz}, year={2026}, publisher={Hugging Face}, url={https://huggingface.co/legesher/language-decoded-lora} } ``` ## License Apache 2.0