| --- |
| license: cc-by-nc-sa-4.0 |
| language: |
| - en |
| library_name: transformers |
| pipeline_tag: text-generation |
| tags: |
| - finance |
| - legal |
| --- |
| # Model Card for Model ID |
|
|
| <!-- Provide a quick summary of what the model is/does. --> |
|
|
| RegLLM is LLM model for regulatory compliance. It has been domain adapted by unsupervised pretraining and instruction finetuned for regulatory compliance. |
| This release focuses on Indian Banking rules and regulations. |
|
|
| ## Model Details |
|
|
| ### Model Description |
|
|
| <!-- Provide a longer summary of what this model is. --> |
|
|
| - **Developed by:** [dataeaze systems pvt ltd](https://www.dataeaze.io/) |
| - **Funded by:** [dataeaze systems pvt ltd](https://www.dataeaze.io/) |
| - **Shared by:** [dataeaze systems pvt ltd](https://www.dataeaze.io/) |
| - **Model type:** PhiForCausalLM |
| - **Language(s) (NLP):** English |
| - **License:** [cc-by-nc-sa-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en) Model is made available under non-commercial use for research purposes only. For commercial usage please connect at contactus@dataeaze.io |
| - **Finetuned from model:** [miscrosoft-phi-2](https://huggingface.co/microsoft/phi-2) |
|
|
|
|
| ## Uses |
|
|
| <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
| ### Direct Use |
|
|
| <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> |
|
|
| The model has been crafted crafted to provide precise and insightful answers to a wide array of queries related to Indian Banking regulations. |
|
|
| ### Downstream Use |
|
|
| <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> |
|
|
| This model can be used as core component in RegTech application |
|
|
| ### Out-of-Scope Use |
|
|
| <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
|
|
| Model has been fine tuned on a specific task of answering questions related to Indian regulatory compliance. |
| Any use beyond this is not guaranteed to be accurate. |
|
|
| ## Bias, Risks, and Limitations |
|
|
| <!-- This section is meant to convey both technical and sociotechnical limitations. --> |
|
|
| - **Bias:** Trained for English language only (as of now). |
| - **Risk:** Guardrails are reliant on the base models Microsoft Phi-2. Finetuning could impact this behaviour. |
| - **Limitations:** Intended to be a small model optimised for Indian regulations (as of now). |
|
|
|
|
| ### Recommendations |
|
|
| <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> |
|
|
| * This model is supposed to be used as an assistive AI technology. Kindly consult and verify with the source documents for decision making. |
| * This model should be used with grounding on a set of regulatory documents. |
|
|
| ## How to Get Started with the Model |
|
|
| ``` |
| import torch |
| from transformers import pipeline |
| |
| pipe = pipeline("text-generation", |
| model="dataeaze/dataeaze-RegLLM-zephyr_7b_beta-dzcompli", |
| torch_dtype=torch.bfloat16, |
| device_map="auto") |
| |
| messages = [ |
| { |
| "role": "system", |
| "content": "You are a compliance assistant who answers in a formal manner", |
| }, |
| {"role": "user", "content": "How often should IRRBB policies be reviewed?"}, |
| ] |
| prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| outputs = pipe(prompt, max_new_tokens=120, do_sample=True, temperature=0.1, top_k=50, top_p=0.95) |
| print(outputs[0]["generated_text"]) |
| ``` |
|
|
| ## Sample Output |
|
|
| ### Question |
| How often should IRRBB policies be reviewed? |
|
|
| ### RegLLM respose |
|
|
| ``` |
| <|system|> |
| You are a compliance assistant who answers in a formal manner</s> |
| <|user|> |
| How often should IRRBB policies be reviewed?</s> |
| <|assistant|> |
| At least annually. |
| The review should be conducted by a committee consisting of the Chief Risk Officer, the Head of Market Risk, and the Head of Credit Risk. |
| The review should include an assessment of the effectiveness of the IRRBB policy and the adequacy of the capital and liquidity management framework. |
| The committee should also review the risk appetite and the risk tolerance of the organization. |
| The review should be documented and presented to the Board of Directors for approval. |
| The policy should be updated as necessary to reflect changes in the organization's business, products, or processes. |
| ``` |
|
|
| ### GPT-4 response |
|  |
|
|
|
|
| ### Reference |
|
|
| For evalating truthfulness / hallucination of this response, refer to RBI notification |
| [RBI/2022-23/180 |
| DOR.MRG.REC.102/00-00-009/2022-23](https://rbidocs.rbi.org.in/rdocs/notification/PDFs/NOTI180CF30A8446A704C11BD8267A8D0BB2AC2.PDF) (page 8) |
|
|
| Screenshot below |
|
|
| <img src="rbi_reference.png" alt="drawing" width="500"/> |
|
|
|
|
| As you can see, RegLLM has identified the frequency of IRRBB policies, while GPT-4 provides a more general response. |
| Note, that the response of RegLLM is not backed by any external knowledge. |
| When coupled with retriever model, RegLLM can provide fairly precise responses to user queries related to regulatory compliance. |
|
|
| Keep watching this space for more updates on the model and evaluations. |
|
|
| ## Model Card Authors |
|
|
| * Niranjan Kakade |
| * Atharva Inamdar |
| * Tony Tom |
| * Nayan Chheda |
| * Sourabh Daptardar |
|
|
| ## Model Card Contact |
|
|
| "dataeaze systems" <contactus@dataeaze.io> |
|
|