Access requires acceptance of the LëtzSDG Terms of Use

The information you provide will be collected solely for access management. Your information will not be shared publicly or used for any commercial purpose.

LETZSDG TERMS OF USE

By requesting access to LëtzSDG, you acknowledge that you have read, understood, and agreed to the following conditions.
Model Origin: This is a BERT-based model trained on raw data from FineWeb and synthetic data generated by google/gemma-3-27b-it, Qwen/Qwen2.5-32B-Instruct, and mistralai/Mistral-Small-24B-Instruct-2503.

Gemma 3 License Compliance (Synthetic Data)
Because this model was trained on data generated by Gemma 3, it is classified
as a "Model Derivative" under the Gemma Terms of Use.

You agree not to use this model for any restricted uses set forth in the
Gemma Prohibited Use Policy (ai.google.dev/gemma/prohibited_use_policy).
You release Google from any liability regarding the outputs of this model.

Purpose of use
LëtzSDG is released strictly for research, educational, and experimental purposes.

It must not be used for financial decision-making, investment recommendation,
portfolio management, or any other regulated activity.
It must not be used in violation of any applicable laws or regulations,
for harmful, abusive, fraudulent, or deceptive activities.

Limitation of Liability and Responsibility

The sponsoring institutions and model developers make no warranties regarding the accuracy or suitability of the model.
The sponsoring institutions and model developers shall not be liable for any direct, indirect, incidental, consequential, or special damages arising from its use.
The end user assumes full responsibility for the outcomes derived from this model.

Data Transparency
No personal data or sensitive information is intended to be included in this model.
Non-endorsement
The inclusion of the names of the partner institutions does not imply endorsement or certification of specific outcomes derived from this model.

By requesting access, you agree to be bound by these terms.

LetzSDG

💚 LëtzSDG: A BERT Model for SDGs Classification

🔍 A model for classifying text based on the United Nations Sustainable Development Goals (SDGs).

📄 Presented at the 2nd IEEE International Workshop on Large Language Models for Finance, co-located with the 2025 IEEE International Conference on Big Data (IEEE BigData 2025).

🌍 Overview

LëtzSDG is a 110M-parameter BERT-based multiclass classifier fine-tuned to identify text excerpts related to the 17 UN Sustainable Development Goals (SDGs).

Developed for sustainable finance, this model supports on-premises, auditable, human-in-the-loop workflows in compliance with the EU AI Act.

Unlike cloud-hosted solutions, LëtzSDG can run locally, ensuring data privacy, traceability, and transparency.

⚙️ Model Details

Base model: bert-base-uncased
Task type: Multiclass text classification (17 SDGs)
Training epochs: 3
Batch size: 16
Optimizer: AdamW (lr = 2e-5, weight_decay = 0.01)
Precision: bfloat16
Max sequence length: 512 tokens

🚀 Quick Start

🔹 Simple inference with the Hugging Face `pipeline`

# pip install transformers

from transformers import pipeline

pipe = pipeline(
    "text-classification",
    model="lrsbrgrn/LetzSDG-1.0",
    truncation=True
)

text = "The company introduced equal pay policies and increased women’s representation in leadership roles."
result = pipe(text)

print(result)
# [{'label': 'SDG_5_GENDER_EQUALITY', 'score': 0.9993481040000916}]

📚 Citation

If you use this work, please cite:

@INPROCEEDINGS {11402269,
author = { Bergeron, Loris and Francois, Jerome and State, Radu and Hilger, Jean },
booktitle = { 2025 IEEE International Conference on Big Data (BigData) },
title = {{ Leveraging Large Language Models to Build Computationally Efficient Models for Sustainable Finance Investment Decision Support }},
year = {2025},
volume = {},
ISSN = {},
pages = {7123-7132},
abstract = { Assessing companies' contributions to the United Nations Sustainable Development Goals (SDGs) is essential for sustainable investment and regulatory reporting. However, extracting reliable insights from heterogeneous textual sources remains a challenge due to limited labeled data, domain imbalance, and privacy constraints. We present LëtzSDG, a lightweight BERT-based multiclass classifier fine-tuned on a hybrid dataset constructed using Large Language Models (LLMs). Multiple LLMs are used to (i) expand domain-specific SDG keywords, (ii) perform consensus-based zero-shot labeling, and (iii) generate synthetic data to balance underrepresented classes. Unlike cloudhosted LLMs, LëtzSDG is designed for on-premises deployment within financial institutions, ensuring data-privacy compliance. Integrated into a human-in-the-loop investment workflow, its predictions are span-linked for traceability and committee review. Evaluated on public datasets (the OSDG Community Dataset and the SDG Integration Corpus), LëtzSDG outperforms SDGspecific baselines, a strong NLI-based zero-shot model, and several open LLMs, while rivaling larger closed models at a fraction of their size. LëtzSDG and its datasets are publicly available on Hugging Face. },
keywords = {Computational modeling;Large language models;Text categorization;Finance;Data models;Human in the loop;Labeling;Sustainable development;Investment;Synthetic data},
doi = {10.1109/BigData66926.2025.11402269},
url = {https://doi.ieeecomputersociety.org/10.1109/BigData66926.2025.11402269},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month =Dec}

🏗️ Data Sources & Attribution

The model was trained using a combination of real-world data and synthetic data generated by Large Language Models.

FineWeb: Raw text data from the FineWeb dataset.
Synthetic Data Generators:

Note: Since this model was trained on synthetic data generated by Gemma 3, it is classified as a "Model Derivative" under their terms and is therefore subject to the Gemma Terms of Use regarding prohibited uses.

Downloads last month: 4

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lrsbrgrn/LetzSDG-1.0

Base model

google-bert/bert-base-uncased

Finetuned

(6626)

this model

Collection including lrsbrgrn/LetzSDG-1.0

LëtzSDG

Collection

2 items • Updated Dec 10, 2025