Disclaimer: This model is a research experiment and does not provide legal advice. The author is not responsible for any legal inaccuracies or misuse of the information generated by this model. Always consult the official EU OJ (Official Journal) for the definitive text of the AI Act.

Gemma 4 (E4B) Fine-tuned on EU AI Act — Spanish Legal Dataset

Domain-adapted LLM for the EU AI Act, with full training pipeline, evaluation, and overfitting analysis.

v1 — Baseline fine-tuning + real-world limitations


Overview

This repository contains a fine-tuned version of Gemma 4 (E4B) trained on a Spanish dataset derived from the EU AI Act.

The goal was to evaluate how far a relatively small model (≈4B parameters) can be pushed in a highly specialized legal domain using supervised fine-tuning.

The dataset (~7.3k Q&A pairs) was generated from a single legal source, which makes it ideal for testing domain adaptation — but also exposes important limitations.

Training Metrics

training_loss

training_curves

Final metrics:

  • Train loss: 1.08
  • Best eval loss: 1.98
  • Perplexity: 7.26
  • ROUGE-L (avg): 0.28

Interpretation:

The model converges properly without strong overfitting, but evaluation results highlight limited generalization due to dataset homogeneity.


Key Findings

  • Fine-tuning improves domain awareness significantly
  • The model stops asking for clarification and answers confidently
  • However, generalization remains limited

Observed behavior:

  • Learns structure and terminology of the AI Act
  • Produces plausible but sometimes incorrect legal statements
  • Hallucinates when precise recall is required

Overfitting Analysis

This project intentionally documents a realistic failure mode:

  • Train loss decreases consistently
  • Eval loss plateaus early (~1.98)
  • Minimal gap between best and final eval → no catastrophic overfitting
  • BUT: poor factual robustness

👉 The model is not simply memorizing —
it is learning patterns without true grounding.


Core Insight

Dataset diversity matters more than dataset size.

Even with thousands of samples, fine-tuning on a single document does not produce a reliable standalone legal model.

This is especially critical in domains like law, where precision matters.


Training Setup

  • Framework: Unsloth + TRL (SFTTrainer)
  • Method: LoRA fine-tuning
  • Precision: bfloat16 training
  • Export: GGUF (q4_k_m)
  • Max sequence length: 512
  • Learning rate: 5e-5
  • Batch (effective): 8

Pipeline includes:

  • Dataset validation and deduplication
  • Baseline model evaluation
  • Post-training qualitative comparison
  • ROUGE-L evaluation on test set
  • Perplexity tracking
  • Full logging + reproducibility config

Model Outputs

The repository includes:

  • GGUF model (quantized for local inference)
  • Multimodal projection file (if applicable)
  • Training logs and evaluation outputs
  • Training scripts (end-to-end pipeline)

Usage

The model is exported to GGUF and can be used with:

  • LM Studio
  • Ollama
  • llama.cpp

Example use case:

  • Spanish legal assistant for querying the EU AI Act
  • Internal compliance support tools
  • Prototyping legal AI workflows

Limitations

This model should NOT be used as a standalone legal authority.

Known limitations:

  • Inaccurate or incomplete legal interpretations
  • Hallucinations in edge cases
  • Lack of grounding in exact legal text
  • Sensitivity to prompt phrasing

Dataset

⚠️ Dataset Notice The Q&A dataset (~7.3k pairs) was synthetically generated using Claude (Anthropic) from the official EU AI Act text. It has not been verified by legal experts. Only some samples were manually reviewed. Do not use for real legal advice.

Known limitation: dataset contains significant topical redundancy due to the generation method. Multiple Q&A pairs cover identical concepts with slight phrasing variations. This contributes to the observed generalization plateau.

This model was fine-tuned on the EU AI Act Spanish Dataset

Recommended Use (Important)

This model performs significantly better when combined with a Retrieval-Augmented Generation (RAG) pipeline.

👉 Without RAG → approximate reasoning
👉 With RAG → grounded legal assistant


Next Steps

  • Incorporate multiple legal sources (GDPR, case law, guidelines)
  • Compare fine-tuning vs RAG vs hybrid approaches
  • Evaluate against real-world legal queries
  • Improve dataset diversity and structure

Why this repo

This is not a "perfect model" repository.

It is a transparent, real-world experiment showing:

  • What works in legal LLM fine-tuning
  • What breaks
  • And why RAG is often necessary

Author note

Built as part of a practical exploration of AI systems applied to legal and regulatory domains (EU AI Act).

Focus: real deployment constraints, not just benchmarks.

Downloads last month
174
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support