Arc Quasar 7B — Compliance ES

Arc Quasar 7B Compliance ES is a QLoRA-fine-tuned adapter built on Qwen2.5-7B-Instruct for structured compliance analysis of contract clauses under European and Spanish data governance regulations.

The model identifies applicable normative frameworks, flags gaps and risks, and produces structured reports with article-level citations. It is designed for preliminary screening, not as a substitute for legal counsel.

Organization: vArx-ai — Arc family, size L.


Covered Regulations

Regulation Scope
RGPD (EU 2016/679) General data protection
LOPDGDD (LO 3/2018) Spanish adaptation of GDPR
Data Act (EU 2023/2854) Data access and portability in data spaces
DGA — Data Governance Act (EU 2022/868) Data intermediaries and altruism
AI Act (EU 2024/1689) Risk-based AI system classification

Output Format

The model returns structured compliance reports using a consistent three-tier marker system:

  • — Clause satisfies the cited requirement
  • ⚠️ — Partial compliance or missing element requiring attention
  • — Human legal review required; ambiguous or context-dependent

Each finding includes the specific article reference (e.g., RGPD Art. 5.1.b).

Example output:

✅ RGPD Art. 5.1.b (limitación de la finalidad) — La cláusula establece expresamente
   que el tratamiento se limita a la finalidad contractual.

⚠️ RGPD Art. 28 — Si el proveedor actúa como encargado del tratamiento, el contrato
   debe incluir las cláusulas mínimas del Art. 28.3.

❓ Requiere revisión: Verificar si existe contrato de encargo de tratamiento formalizado
   y si la base jurídica del tratamiento está correctamente identificada.

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

BASE_MODEL = "Qwen/Qwen2.5-7B-Instruct"
ADAPTER    = "varx-ai/arc-quasar-7b-compliance-es"

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
base      = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base, ADAPTER)
model.eval()

SYSTEM = (
    "Eres un asistente experto en normativa de espacios de datos europeos y españoles. "
    "Analizas cláusulas contractuales e identificas qué normativa aplica, "
    "qué se cumple y qué podría faltar. "
    "Nunca das veredictos jurídicos definitivos ni sustituyes el asesoramiento jurídico profesional."
)

clause = "El proveedor tratará los datos personales de los clientes únicamente para la prestación del servicio contratado, sin cederlos a terceros."

messages = [
    {"role": "system",  "content": SYSTEM},
    {"role": "user",    "content": clause},
]

text   = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.1,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
    )

response = tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

With 4-bit quantization (recommended for consumer GPUs)

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)
base = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb_config,
    device_map="auto",
)

Training Details

Dataset

A synthetic dataset of 484 instruction-tuning pairs generated from 28 hand-crafted seed examples grounded in the actual text of RGPD, LOPDGDD, Data Act, DGA, and AI Act. Seeds were expanded via clause templates covering data processing bases, data subject rights, security measures, data transfers, DGA intermediary obligations, and AI Act risk categories. Split: 436 train / 48 validation.

Method

QLoRA — 4-bit NF4 quantization (bitsandbytes) with LoRA adapters trained in float16.

Hyperparameters

Parameter Value
Base model Qwen/Qwen2.5-7B-Instruct
LoRA rank (r) 16
LoRA alpha 32
LoRA dropout 0.05
Target modules q_proj, k_proj, v_proj, o_proj
Quantization 4-bit NF4 (bitsandbytes)
Compute dtype float16
Epochs 3
Batch size 1
Gradient accumulation 8 (effective batch: 8)
Max sequence length 1,024 tokens
Learning rate schedule Cosine
Peak learning rate ~2.0 × 10⁻⁴
Hardware Kaggle — NVIDIA Tesla T4 × 2
Framework transformers 4.x · trl >= 0.13 · peft 0.18.1

Training Metrics

Epoch Eval Loss
1 0.6715
2 0.5131
3 0.4892
  • Best checkpoint: step 165 (epoch 3)
  • Token accuracy (final epoch): 88.1%
  • Total training steps: 165

Limitations

  • The adapter covers only the five regulations listed above. Clauses touching other frameworks (NIS2, ePrivacy, sector-specific regulations) are outside its training distribution.
  • The dataset is synthetic and template-generated. Real-world clause variety — unusual phrasing, multi-jurisdictional contracts, legacy terminology — may reduce reliability.
  • The model explicitly declines to issue definitive legal opinions and always recommends human review for ambiguous findings.
  • Quantization (4-bit) introduces a small accuracy trade-off versus full-precision inference.
  • Not suitable for high-stakes legal determinations without review by a qualified legal professional.

Intended Use

Designed for:

  • Digital consulting teams performing preliminary contract screening
  • Legal tech applications requiring a first-pass compliance flag before human review
  • Research on LLM applications in European regulatory compliance

Not intended for:

  • Automated legal decision-making without human oversight
  • Jurisdictions outside the EU/Spain regulatory perimeter
  • Regulations not listed in the covered frameworks above

Citation

@misc{varxai2025arcquasar,
  author       = {vArx-ai},
  title        = {Arc Quasar 7B — Compliance ES},
  year         = {2025},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/varx-ai/arc-quasar-7b-compliance-es}},
}

Arc Quasar is part of the Arc model family by vArx-ai. Arc models are designed for structured reasoning in high-context regulatory and technical domains.

Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vArx-ai/arc-quasar-7b-compliance-es

Base model

Qwen/Qwen2.5-7B
Adapter
(1804)
this model