Granite RAG Library

The Granite RAG Library includes six adapters implemented as LoRA adapters for ibm-granite/granite-4.0-micro, each of which expects as input a (single-turn or multi-turn) conversation between a user and an AI assistant, and most of which also expect a set of grounding passages. Each adapter has been developed for a specific task that is likely to be useful in Agentic RAG pipelines. We give a brief overview of the functionality of each adapter, as the details can be found in each individual adapter's README.

Capabilities implemented as LoRA adapters

The six adapters that have been implemented as LoRA adapters for ibm-granite/granite-4.0-micro and made available in this HF repository are:

Query Rewrite (QR): Given a conversation ending with a user query, QR will decontextualize that last user query by rewriting it (whenever necessary) into an equivalent version that is standalone and can be understood by itself. While this adapter is general purpose for any multi-turn conversation, it is especially effective in RAG settings where its ability to rewrite a user query into a standalone version directly improves the retriever performance, which in turn improves the answer generation performance. This is a pre-retrieval adapter since its suggested use is before invoking retrieval.

Query Clarification (QC): Given a conversation ending with a user query, (and optionally relevant content such as RAG documents), QC will detect whether the last user query is underspecified (no clear interpretation or multiple valid interpretations) and, if so, formulate an appropriate clarification request back to the user. QR will decontextualize that last user query by rewriting it (whenever necessary) into an equivalent version that is standalone and can be understood by itself. The adapter is designed for conversational use cases where user queries may be ill-formed, unclear, or have multiple valid interpretations based on the underlying system or content. This adapter is pre-retrievalOR pre-generation since it can be used either before or after invoking retrieval

Context Relevance (CR): Given a conversation ending with a user query, and an individual passage, CR classifies whether the passage is relevant, partially relevant, or irrelevant for answering the last user query - or if the passage may instead mislead or harm the downstream generator model’s response quality. This is a pre-generation adapter.

Answerability Determination (AD): Given a conversation ending with a user query, and a set of passages, AD classifies whether that final user query is answerable or unanswerable based on the available information in the passages. It is valuable for restraining over-eager models by identifying unanswerable queries and preventing the generation of hallucinated responses. It can also be used to indicate that the system should re-query the retriever with alternate formulations, to fetch more relevant passages. This is a pre-generation adapter.

Hallucination Detection (HD): Given a conversation ending with an assistant response, and a set of passages, HD outputs a hallucination risk range for each sentence in the last assistant response, with respect to the set of passages. This could be used in concert with sampling techniques that yield multiple generated responses, some of which could then be filtered according to their HD scores. This is a post-generation adapter since its expected use is after invoking the LLM to create the response.

Citation Generation (CG): Given a conversation ending with an assistant response, and a set of passages, CG generates citations for that last assistant response from the provided passages. Citations are generated for each sentence in the response (when available), where each citation consists of a set of sentences from the supporting passages. This is a post-generation adapter since its expected use is after invoking the LLM, and therefore can be used to create citations for responses generated by any model.

Recommended Use

The recommended way to call all adaptors is through the Mellea framework. For code snippets demonstrating how to use them please refer to the Mellea intrinsics examples.

Downloads last month: 44

GGUF

Model size

14.4M params

Architecture

granite

Hardware compatibility

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cguna/granitelib-rag-r1.0

Base model

ibm-granite/granite-4.0-micro

Quantized

(24)

this model