Safetensors
Spanish
llama

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Salamandra 7B - Legal Administrative (SINAI)

This repository contains a domain-adapted version of the Salamandra 7B model, optimized for the Spanish legal and administrative domain.

This model is the result of a continual pre-training process on the Salamandra 7B Base model, followed by instruction tuning using specialized datasets.

The original Salamandra family is released under a permissive Apache 2.0 license.

DISCLAIMER: This model is a domain-specific proof-of-concept designed to demonstrate the capabilities of Salamandra in the legal-administrative field. While optimized for this domain, it has NOT been aligned through RLHF to filter or avoid sensitive topics. As a result, it may generate harmful or inappropriate content, or legally inaccurate information. Users should verify any legal information generated against official sources.


Model Details

Description

This model is a Transformer-based decoder-only language model. It builds upon the Salamandra 7B architecture through a adaptation process:

Continual Pre-training (CPT): The base model was further pre-trained on the SINAI/ALIA-legal-administrative corpus to adapt its weights to the specific vocabulary and structures of legal and administrative Spanish.

Architecture

Base Model Salamandra 7B
Total Parameters 7,768,117,248
Embedding Parameters 1,048,576,000
Layers 32
Hidden size 4,096
Attention heads 32
Context length 8,192
Vocabulary size 256,000
Precision bfloat16
Embedding type RoPE
Activation Function SwiGLU
Layer normalization RMS Norm
Flash attention
Grouped Query Attention
Num. query groups 8

Intended Use

Direct Use

The model is intended for research and commercial use specifically within the Spanish legal and public administration context. Typical use cases include:

  • Summarization of administrative documents.
  • Question answering regarding public procedures.
  • Simplification of legal jargon ("Plain Language").

Out-of-scope Use

The model is not intended for malicious activities. It is explicitly out of scope to use this model as a replacement for a qualified lawyer or legal advisor. Any downstream application must comply with current laws and regulations.


How to use

Python Example

from datetime import datetime
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Replace with the actual path to your uploaded model
model_id = "SINAI/salamandra-7b-legal-admin" 

text = "¿Cuáles son los requisitos para presentar una instancia administrativa?"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16
)

message = [ { "role": "user", "content": text } ]

prompt = tokenizer.apply_chat_template(
    message,
    tokenize=False,
    add_generation_prompt=True
)

inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=200)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Data

Domain Adaptation Data (SINAI)

To specialize the model, we utilized high-quality datasets provided by the SINAI Research Group (Universidad de Jaén):

  • Continual Pre-training
    • Dataset: SINAI/ALIA-legal-administrative
    • Description: A large corpus of texts belonging to the legal and administrative domain in Spanish. This dataset was used to adapt the linguistic distribution of the base model to the target domain.

Original Pre-training Data (Base Model)

The underlying base model (Salamandra 7B) was pre-trained on 12.875 trillion tokens of highly curated data, covering 35 European languages and code. For a full detailed list of the original pre-training sources, please refer to the Original Salamandra Model Card.

Citation

If you use this model or the datasets, please cite the Salamandra Technical Report and the SINAI datasets accordingly.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SINAI/ALIA-legal-administrative-7B-Base

Finetuned
(11)
this model
Finetunes
1 model

Paper for SINAI/ALIA-legal-administrative-7B-Base