RegulaUAE-1.2B - UAE Rulebook Q&A Assistant - Finetuned LFM2 Model
Model ID: rajeshthangaraj1/uae_rule_book_QA_assistant
Base Model: [unsloth/LFM2-1.2B](https://docs.unsloth.ai/)
📘 Model Overview
RegulaUAE-1.2B is a domain-specific conversational language model fine-tuned to answer questions strictly grounded in the UAE Central Bank Rulebook (Banking Regulations).
The model is designed to support regulatory, compliance, and educational use cases, with a strong focus on reduced hallucination within the UAE banking domain. Tested against CBUAE regulatory queries.
Coverage includes:
- Capital Adequacy
- Licensing & Authorization
- Corporate Governance
- Risk Management
- Compliance & Supervisory Frameworks
🔍 Key Characteristics
- Domain: UAE Central Bank – Banking Regulations
- Dataset Size: 500,000+ question–answer pairs
- Language: English (complete rulebook coverage)
- Precision: bfloat16
- Task Type: Domain-specific legal & regulatory Q&A
- Framework: Hugging Face transformers
- Pipeline: text-generation with chat template support
🎯 Intended Use Cases
Regulatory & Legal Q&A
- What is the relationship between Decree Law No. (20) of 2018 and Cabinet Decision No. (10) of 2019?
- What minimum capital ratios are specified under Article (2)?
Compliance & Risk Teams
- Regulatory validation and internal compliance support
Education & Research
- Learning UAE banking regulations in a conversational format
AI & FinTech Development
- Base model for regulation-aware RAG systems
⚠️ Limitations
- Hallucination Risk: Without retrieval-augmented generation (RAG), the model may generate plausible but incorrect answers in edge cases.
- Domain Scope: Limited to UAE Central Bank banking regulations only.
- Numerical Accuracy: Percentages, ratios, and article references should be verified against the official rulebook.
📊 Dataset Creation
Source Data
Publicly available content from the official UAE Central Bank Rulebook:
https://rulebook.centralbank.ae
Preprocessing
- Scraped and cleaned official rulebook content
- Segmented into ~65,000+ semantically aligned text chunks
- Average chunk size: ~500 characters
- Preserved articles, clauses, and legal definitions
Q&A Generation
Each chunk was used as grounded context to generate question–answer pairs.
Dataset structure:
{
"context": "Rulebook text chunk",
"question": "Regulatory question",
"answer": "Answer grounded in the context"
}
🧪 Example Usage (Transformers)
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "rajeshthangaraj1/uae_rule_book_QA_assistant"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.bfloat16
)
messages = [
{"role": "system", "content":
"You are an assistant specialized in the UAE Central Bank Rulebook. "
"Only answer based on the UAE Rulebook. "
"If the answer is not in the Rulebook, reply 'Not found in UAE Rulebook'."},
{"role": "user", "content":
"According to the UAE Central Bank Rulebook – Capital Adequacy Section, "
"what does Article (2) specify about minimum capital ratios?"}
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
inputs.pop("token_type_ids", None)
outputs = model.generate(**inputs, max_new_tokens=128)
answer = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(answer)
🖥️ Example Usage (Gradio)
import gradio as gr
def chat_with_model(message, history):
# (your chat_with_model function here)
...
gr.ChatInterface(fn=chat_with_model, title="UAE Rulebook QA Assistant").launch()
🔧 Technical Details
- Base Model: LFM2-1.2B
- Fine-tuning: LoRA adapters
- Precision: bfloat16
- Training Stack: Hugging Face transformers + accelerate
🛣️ Roadmap
- Retrieval-Augmented Generation (RAG) integration
- Arabic language support
- Enhanced hallucination reduction and safety controls
- Productization for compliance-critical environments
✍️ Author: @rajeshthangaraj1
📅 Last Updated: 2026
- Downloads last month
- 32
