You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Salamandra 7B - Legal Administrative (SINAI)

This repository contains a domain-adapted version of the Salamandra 7B model, optimized for the Spanish legal and administrative domain.

This model is the result of a continual pre-training process on the Salamandra 7B Base model, followed by instruction tuning using specialized datasets.

The original Salamandra family is released under a permissive Apache 2.0 license.

DISCLAIMER: This model is a domain-specific proof-of-concept designed to demonstrate the capabilities of Salamandra in the legal-administrative field. While optimized for this domain, it has NOT been aligned through RLHF to filter or avoid sensitive topics. As a result, it may generate harmful or inappropriate content, or legally inaccurate information. Users should verify any legal information generated against official sources.

Model Details

Description

This model is a Transformer-based decoder-only language model. It builds upon the Salamandra 7B architecture through a adaptation process:

Continual Pre-training (CPT): The base model was further pre-trained on the SINAI/ALIA-legal-administrative corpus to adapt its weights to the specific vocabulary and structures of legal and administrative Spanish.

Architecture


Base Model	Salamandra 7B
Total Parameters	7,768,117,248
Embedding Parameters	1,048,576,000
Layers	32
Hidden size	4,096
Attention heads	32
Context length	8,192
Vocabulary size	256,000
Precision	bfloat16
Embedding type	RoPE
Activation Function	SwiGLU
Layer normalization	RMS Norm
Flash attention	✅
Grouped Query Attention	✅
Num. query groups	8

Intended Use

Direct Use

The model is intended for research and commercial use specifically within the Spanish legal and public administration context. Typical use cases include:

Summarization of administrative documents.
Question answering regarding public procedures.
Simplification of legal jargon ("Plain Language").

Out-of-scope Use

The model is not intended for malicious activities. It is explicitly out of scope to use this model as a replacement for a qualified lawyer or legal advisor. Any downstream application must comply with current laws and regulations.

How to use

Python Example

from datetime import datetime
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Replace with the actual path to your uploaded model
model_id = "SINAI/salamandra-7b-legal-admin" 

text = "¿Cuáles son los requisitos para presentar una instancia administrativa?"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16
)

message = [ { "role": "user", "content": text } ]

prompt = tokenizer.apply_chat_template(
    message,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=200)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Data

Domain Adaptation Data (SINAI)

To specialize the model, we utilized high-quality datasets provided by the SINAI Research Group (Universidad de Jaén):

Continual Pre-training
- Dataset: SINAI/ALIA-legal-administrative
- Description: A large corpus of texts belonging to the legal and administrative domain in Spanish. This dataset was used to adapt the linguistic distribution of the base model to the target domain.

Original Pre-training Data (Base Model)

The underlying base model (Salamandra 7B) was pre-trained on 12.875 trillion tokens of highly curated data, covering 35 European languages and code. For a full detailed list of the original pre-training sources, please refer to the Original Salamandra Model Card.

Citation

If you use this model or the datasets, please cite the Salamandra Technical Report and the SINAI datasets accordingly.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SINAI/ALIA-legal-administrative-7B-Base

Base model

BSC-LT/salamandra-7b

Finetuned

(11)

this model

Finetunes

1 model

Paper for SINAI/ALIA-legal-administrative-7B-Base

Salamandra Technical Report

Paper • 2502.08489 • Published Feb 12, 2025 • 3