Context Attribution
Model Summary
Context Attribution is a purpose-built LoRA for ibm-granite/granite-4.0-micro, to predict the sentences in the context that were most important for granite-4.0-micro to generate each sentence in its response or output. Here, context includes previous conversation turns as well as any documents provided to the granite-4-micro model. The context attribution LoRA thus helps to explain granite-4.0-micro's behavior, specifically how its probability of generating a certain response sentence is affected by different parts of the context.
The context attribution model takes the form of a LoRA adapter. The adapter greatly improves granite-4.0-micro's ability to attribute to context (see Evaluation section below) while retaining other capabilities of the base model. The context attribution adapter identifies which context sentences actually influenced a model’s response, while IBM Granite’s citation generation adapter highlights sentences that support the response regardless of whether the model used them.
- Developer: IBM Research
- HF Collection: Granite Libraries
- GitHub Repository: https://github.com/ibm-granite
- Release Date: March 18th, 2026
- Model Type: LoRA adapter for ibm-granite/granite-4.0-micro
- License: Apache 2.0
- Paper: Context Attribution is trained to approximate the importance ranking of context sentences provided by the MExGen method in [Monteiro Paes and Wei et al., ACL 2025] Multi-Level Explanations for Generative Language Models.
Usage
Intended use: Context Attribution is a context attribution adapter intended for IBM's granite-4.0-micro LLM. It enables the base model to accurately identify which sentences in the context (including previous conversation turns and documents) were most important to the model when generating each sentence in its response. The adapter thus helps to explain the base model's behavior, specifically how its probability of generating a certain response sentence is affected by parts of the context. This adapter is designed to be used as part of the Granite inference pipeline. It is intended to be called after the base model generates a response to provide a post hoc explanation.
The context attribution adapter is similar to the citations adapter found in the granitelib-rag-r1.0 library, in that it uses a comparable input/output format and similiar training data. However, the two differ in the type of attribution they provide. Citation generation provides corroborative attribution, identifying document sentences that best support a response regardless of whether the model relied on them, making it model-agnostic. In contrast, the context attribution adapter provides contributive attribution, identifying the context sentences that actually influenced a specific model’s response. It also includes prior conversation turns (not just documents) and ranks context sentences by importance rather than listing them unordered.
Input Format The context attribution LoRA expects the input to be processed as follows: 1) The last assistant response (the response to be attributed) should be split into sentences and the sentences numbered with tags as follows: "<r0> sentence 0 <r1> sentence 1 ... ". 2) The context (documents and previous conversation turns) should also be split into sentences and tagged as "<c0> sentence 0 <c1> sentence 1 ... ". The numbering of context sentences starts with the first document (if present) and continues as a single sequence through all the documents and then previous conversation turns, in that order. The ability to perform this input processing automatically will soon be available from IBM's mellea package (more specifically, it will be provided by the granite-common package that will be integrated into mellea). The Quickstart Example below shows approximately how this automated input processing will be called. Alternatively, the user can insert sentence tags manually or by other means (also shown in the Quickstart Example below).
Output Format The context attribution LoRA produces a JSON array of the form [{"r": 0, "c": [3, 1, 4]}, {"r": 1, "c": [1, 5]}, ... ]. Each JSON object in the array corresponds to a response sentence and has two members: "r" with the response sentence number as the value, and "c" with an array of context sentence numbers as the value. The context sentences are listed in order of decreasing importance for each response sentence (in the example above, context sentence 3 is identified as the most important for the base model to generate response sentence 0).
Use Cases
- Human understanding: Context attribution helps to explain a base LLM by showing which parts of the context were most important to it in generating a certain response. Such views into the LLM's behavior may help users calibrate their trust in using the LLM in a particular domain.
- RAG: The Granite 4.0 Micro - Context Attribution adapter is trained on a retrieval-augmented generation (RAG) dataset, specifically on document-grounded multi-turn question answering. The Evaluation section below shows that it also performs well when the use case is document summarization. It is thus especially suitable for RAG scenarios where users may wish to attribute LLM responses to a sizable context.
- Context reliance assessment: Context attribution can also be used to assess whether an LLM makes use of its context at all for a specific response. Low attribution scores to all context sentences might mean one of several things: 1) The response sentence is strongly implied by previous response sentences and does not need to refer to context; 2) the response comes primarily from the LLM's parametric knowledge; 3) the response is hallucinated. As such, determining the main reason for context non-reliance would require further investigation.
Quickstart Example (LoRA)
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
BASE_NAME = "ibm-granite/granite-4.0-micro"
ADAPTER_REPO = "ibm-granite/granitelib-core-r1.0"
ADAPTER_SUBFOLDER = "context-attribution/granite-4.0-micro/lora"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load context attribution model (base model + LoRA)
tokenizer = AutoTokenizer.from_pretrained(BASE_NAME, padding_side="left", trust_remote_code=True)
model_ca = PeftModel.from_pretrained(
AutoModelForCausalLM.from_pretrained(BASE_NAME, device_map="auto", dtype=torch.bfloat16),
ADAPTER_REPO,
subfolder=ADAPTER_SUBFOLDER
)
# Documents provided as context
documents = [
{"doc_id": "0", "text": "The Plastic Ono Band is a band formed by John Lennon and Yoko Ono in 1969 as a vehicle for their collaborative and solo projects. Lennon and Ono had begun a personal and artistic relationship in 1968, collaborating on several experimental releases. Following their marriage in 1969, they decided that all of their future endeavours would be credited to a conceptual and collaborative vehicle, Plastic Ono Band. The band would go on to feature a rotating lineup of many musicians, including Eric Clapton, Klaus Voormann, Alan White, Billy Preston, Jim Keltner, Delaney & Bonnie and Friends, and Lennon's former Beatles bandmates George Harrison and Ringo Starr. Lennon and Ono left the UK to settle in New York City during the fall of 1971. In Greenwich Village, the couple became more politically active and began writing protest songs. These songs became the basis for their next album, Some Time in New York City. As backing, they enlisted the help of New York band Elephant's Memory, consisting of guitarist Wayne 'Tex' Gabriel, bassist Gary Van Scyoc, saxophonist Stan Bronstein, keyboardist Adam Ippolito, keyboardist John La Boosca, and drummer Richard Frank, Jr. Phil Spector produced, and Jim Keltner also played on the album. The album was released on 12 June 1972, credited to \"John & Yoko/Plastic Ono Band with Elephant's Memory plus Invisible Strings\". Some Time in New York City included a second disc, entitled Live Jam, which included the recordings from the 1969 Peace for Christmas concert and the 1971 performance with Frank Zappa. Ono and Lennon continued their work with Elephant's Memory throughout 1972, performing as the Plastic Ono Elephant's Memory Band (which also included Jim Keltner). On 30 August, they performed a pair of benefit concerts at Madison Square Garden. The benefit, entitled \"One to One\", was organised by Geraldo Rivera to raise money for children with mental challenges. By this time, La Boosca had departed the band, and the concert saw the addition of John Ward on bass. The concert was filmed and recorded, later released in February 1986 as the album Live In New York City. They also performed at the Jerry Lewis MDA Labor Day Telethon. The last collaboration of the Plastic Ono Elephant's Memory Band was Ono's double album Approximately Infinite Universe. It was recorded throughout the fall of 1972, and was released in January 1973."},
{"doc_id": "1", "text": "The Plastic Ono Band is a band formed by John Lennon and Yoko Ono in 1969 as a vehicle for their collaborative and solo projects. Lennon and Ono had begun a personal and artistic relationship in 1968, collaborating on several experimental releases. Following their marriage in 1969, they decided that all of their future endeavours would be credited to a conceptual and collaborative vehicle, Plastic Ono Band. The band would go on to feature a rotating lineup of many musicians, including Eric Clapton, Klaus Voormann, Alan White, Billy Preston, Jim Keltner, Delaney & Bonnie and Friends, and Lennon's former Beatles bandmates George Harrison and Ringo Starr. By the beginning of 1973, recording had begun in earnest on Ono's next album, Feeling the Space, featuring a new group of studio musicians. The newest incarnation of the Plastic Ono Band featured guitarist David Spinozza, keyboardist Ken Ascher, bassist Gordon Edwards, percussionists Arthur Jenkins and David Friedman, saxophonist Michael Brecker, pedal steel guitarist Sneaky Pete Kleinow, as well as regular contributor Jim Keltner. The album would be released in November. Throughout 1973, Lennon and Ono's relationship became strained. By August, the two had begun a period of separation that Lennon called \"The Lost Weekend\". Lennon began the recording of his own album, Mind Games, using the same players as on Feeling the Space, dubbed \"The Plastic U.F.Ono Band\". Around the time of the album's release in November, Lennon moved to Los Angeles with new lover May Pang. In October, Lennon began the recording of an album of rock 'n' roll oldies (a contractual obligation due to a lawsuit). These featured many Plastic Ono Band regulars (including much of the \"U.F.Ono Band\", Klaus Voorman, and the return of Phil Spector to the production chair), but upon release in 1975 as Rock 'n' Roll, it was credited to Lennon alone. The sessions for Rock 'n' Roll were extremely troubled, and the sessions were abandoned until a later date. In July 1974, Lennon returned to New York to record Walls and Bridges. The new \"Plastic Ono Nuclear Band\" featured both old and new faces, with Jim Keltner, Kenneth Ascher, and Arthur Jenkins continuing from Mind Games, the returns of Klaus Voorman, Nicky Hopkins, and Bobby Keys, and the addition of guitarists Jesse Ed Davis and Eddie Mottau. Recording was finished in August, and the album was released 26 September and 4 October in the US and UK respectively. Walls and Bridges would prove to be the last release of new material by the Plastic Ono Band in the 1970s. Lennon subsequently returned to his marriage with Ono and retired from music following the birth of his son Sean. The compilation Shaved Fish was released in October 1975, Lennon's last release credited to the Plastic Ono Band. Upon his and Ono's return to music in 1980 for the album Double Fantasy, they played with an all-new group of studio musicians who were not billed as any variation of the Plastic Ono Band name. Lennon was shot and killed shortly after the release of the album."},
{"doc_id": "2", "text": "John Winston Ono Lennon (9 October 1940 - 8 December 1980) was an English singer, songwriter, and peace activist who co-founded the Beatles, the most commercially successful band in the history of popular music. He and fellow member Paul McCartney formed a much-celebrated songwriting partnership. Along with George Harrison and Ringo Starr, the group would ascend to world-wide fame during the 1960s. During his marriage to Cynthia, Lennon's first son Julian was born at the same time that his commitments with the Beatles were intensifying at the height of Beatlemania. Lennon was touring with the Beatles when Julian was born on 8 April 1963. Julian's birth, like his mother Cynthia's marriage to Lennon, was kept secret because Epstein was convinced that public knowledge of such things would threaten the Beatles' commercial success. Julian recalled that as a small child in Weybridge some four years later, \"I was trundled home from school and came walking up with one of my watercolour paintings. It was just a bunch of stars and this blonde girl I knew at school. And Dad said, 'What's this?' I said, 'It's Lucy in the sky with diamonds.'\" Lennon used it as the title of a Beatles song, and though it was later reported to have been derived from the initials LSD, Lennon insisted, \"It's not an acid song.\" McCartney corroborated Lennon's explanation that Julian innocently came up with the name. Lennon was distant from Julian, who felt closer to McCartney than to his father. During a car journey to visit Cynthia and Julian during Lennon's divorce, McCartney composed a song, \"Hey Jules\", to comfort him. It would evolve into the Beatles song \"Hey Jude\". Lennon later said, \"That's his best song. It started off as a song about my son Julian ... he turned it into 'Hey Jude'. I always thought it was about me and Yoko but he said it wasn't.\" Lennon's relationship with Julian was already strained, and after Lennon and Ono moved to Manhattan in 1971, Julian would not see his father again until 1973. With Pang's encouragement, arrangements were made for Julian (and his mother) to visit Lennon in Los Angeles, where they went to Disneyland. Julian started to see his father regularly, and Lennon gave him a drumming part on a Walls and Bridges track. He bought Julian a Gibson Les Paul guitar and other instruments, and encouraged his interest in music by demonstrating guitar chord techniques. Julian recalls that he and his father \"got on a great deal better\" during the time he spent in New York: \"We had a lot of fun, laughed a lot and had a great time in general.\" In a Playboy interview with David Sheff shortly before his death, Lennon said, \"Sean was a planned child, and therein lies the difference. I don't love Julian any less as a child. He's still my son, whether he came from a bottle of whiskey or because they didn't have pills in those days. He's here, he belongs to me, and he always will.\" He said he was trying to re-establish a connection with the then 17-year-old, and confidently predicted, \"Julian and I will have a relationship in the future.\" After his death it was revealed that he had left Julian very little in his will."},
]
# Previous conversation turns
messages = [
{"role": "user", "content": "Who were the members of The Metal Ono Band, which was formed by Yoko Ono in 1976 to explore her interest in heavy metal music?"},
{"role": "assistant", "content": "I'm sorry, but I don't have the data to answer that specific question. "},
]
# Current question
question = "What was the concept behind the formation of the Plastic Ono Band?"
messages.append({"role": "user", "content": question})
# Response pre-generated by granite-4.0-micro, could also generate a fresh response
response = "The Plastic Ono Band was formed by John Lennon and Yoko Ono in 1969 as a collaborative vehicle for their artistic and personal projects. They decided to credit all their future efforts to this conceptual and collaborative group after their marriage in 1969."
messages.append({"role": "assistant", "content": response})
# The context attribution adapter expects the last assistant response and context (documents
# + previous conversation turns) to be split into sentences and the sentences to be tagged with numbers.
# Here we either assume that this input processing has been done,
# or use the `granite_common` package to process the inputs.
use_granite_common = False
if use_granite_common:
# NOTE: `granite_common` has been integrated into the `mellea` package.
# To use the `granite_common` functionality in `mellea`, the following import statement will be different,
# and the exact syntax below may also change.
import granite_common
# Create rewriter
rewriter_config_file = "path/to/rewriter/config"
rewriter = granite_common.IntrinsicsRewriter(config_file=rewriter_config_file)
# Process inputs
request_dict = {"messages": messages, "extra_body": {"documents": documents}}
request_rewritten = rewriter.transform(request_dict).model_dump()
# Extract fields
documents_rewritten = request_rewritten["extra_body"]["documents"]
messages_rewritten = request_rewritten["messages"]
else:
# Use pre-processed inputs
documents_rewritten = [
{"doc_id": "0", "text": "<c0> The Plastic Ono Band is a band formed by John Lennon and Yoko Ono in 1969 as a vehicle for their collaborative and solo projects. <c1> Lennon and Ono had begun a personal and artistic relationship in 1968, collaborating on several experimental releases. <c2> Following their marriage in 1969, they decided that all of their future endeavours would be credited to a conceptual and collaborative vehicle, Plastic Ono Band. <c3> The band would go on to feature a rotating lineup of many musicians, including Eric Clapton, Klaus Voormann, Alan White, Billy Preston, Jim Keltner, Delaney & Bonnie and Friends, and Lennon's former Beatles bandmates George Harrison and Ringo Starr. <c4> Lennon and Ono left the UK to settle in New York City during the fall of 1971. <c5> In Greenwich Village, the couple became more politically active and began writing protest songs. <c6> These songs became the basis for their next album, Some Time in New York City. <c7> As backing, they enlisted the help of New York band Elephant's Memory, consisting of guitarist Wayne 'Tex' Gabriel, bassist Gary Van Scyoc, saxophonist Stan Bronstein, keyboardist Adam Ippolito, keyboardist John La Boosca, and drummer Richard Frank, Jr. <c8> Phil Spector produced, and Jim Keltner also played on the album. <c9> The album was released on 12 June 1972, credited to \"John & Yoko/Plastic Ono Band with Elephant's Memory plus Invisible Strings\". <c10> Some Time in New York City included a second disc, entitled Live Jam, which included the recordings from the 1969 Peace for Christmas concert and the 1971 performance with Frank Zappa. <c11> Ono and Lennon continued their work with Elephant's Memory throughout 1972, performing as the Plastic Ono Elephant's Memory Band (which also included Jim Keltner). <c12> On 30 August, they performed a pair of benefit concerts at Madison Square Garden. <c13> The benefit, entitled \"One to One\", was organised by Geraldo Rivera to raise money for children with mental challenges. <c14> By this time, La Boosca had departed the band, and the concert saw the addition of John Ward on bass. <c15> The concert was filmed and recorded, later released in February 1986 as the album Live In New York City. <c16> They also performed at the Jerry Lewis MDA Labor Day Telethon. <c17> The last collaboration of the Plastic Ono Elephant's Memory Band was Ono's double album Approximately Infinite Universe. <c18> It was recorded throughout the fall of 1972, and was released in January 1973."},
{"doc_id": "1", "text": "<c19> The Plastic Ono Band is a band formed by John Lennon and Yoko Ono in 1969 as a vehicle for their collaborative and solo projects. <c20> Lennon and Ono had begun a personal and artistic relationship in 1968, collaborating on several experimental releases. <c21> Following their marriage in 1969, they decided that all of their future endeavours would be credited to a conceptual and collaborative vehicle, Plastic Ono Band. <c22> The band would go on to feature a rotating lineup of many musicians, including Eric Clapton, Klaus Voormann, Alan White, Billy Preston, Jim Keltner, Delaney & Bonnie and Friends, and Lennon's former Beatles bandmates George Harrison and Ringo Starr. <c23> By the beginning of 1973, recording had begun in earnest on Ono's next album, Feeling the Space, featuring a new group of studio musicians. <c24> The newest incarnation of the Plastic Ono Band featured guitarist David Spinozza, keyboardist Ken Ascher, bassist Gordon Edwards, percussionists Arthur Jenkins and David Friedman, saxophonist Michael Brecker, pedal steel guitarist Sneaky Pete Kleinow, as well as regular contributor Jim Keltner. <c25> The album would be released in November. <c26> Throughout 1973, Lennon and Ono's relationship became strained. <c27> By August, the two had begun a period of separation that Lennon called \"The Lost Weekend\". <c28> Lennon began the recording of his own album, Mind Games, using the same players as on Feeling the Space, dubbed \"The Plastic U.F.Ono Band\". <c29> Around the time of the album's release in November, Lennon moved to Los Angeles with new lover May Pang. <c30> In October, Lennon began the recording of an album of rock 'n' roll oldies (a contractual obligation due to a lawsuit). <c31> These featured many Plastic Ono Band regulars (including much of the \"U.F.Ono Band\", Klaus Voorman, and the return of Phil Spector to the production chair), but upon release in 1975 as Rock 'n' Roll, it was credited to Lennon alone. <c32> The sessions for Rock 'n' Roll were extremely troubled, and the sessions were abandoned until a later date. <c33> In July 1974, Lennon returned to New York to record Walls and Bridges. <c34> The new \"Plastic Ono Nuclear Band\" featured both old and new faces, with Jim Keltner, Kenneth Ascher, and Arthur Jenkins continuing from Mind Games, the returns of Klaus Voorman, Nicky Hopkins, and Bobby Keys, and the addition of guitarists Jesse Ed Davis and Eddie Mottau. <c35> Recording was finished in August, and the album was released 26 September and 4 October in the US and UK respectively. <c36> Walls and Bridges would prove to be the last release of new material by the Plastic Ono Band in the 1970s. <c37> Lennon subsequently returned to his marriage with Ono and retired from music following the birth of his son Sean. <c38> The compilation Shaved Fish was released in October 1975, Lennon's last release credited to the Plastic Ono Band. <c39> Upon his and Ono's return to music in 1980 for the album Double Fantasy, they played with an all-new group of studio musicians who were not billed as any variation of the Plastic Ono Band name. <c40> Lennon was shot and killed shortly after the release of the album."},
{"doc_id": "2", "text": "<c41> John Winston Ono Lennon (9 October 1940 - 8 December 1980) was an English singer, songwriter, and peace activist who co-founded the Beatles, the most commercially successful band in the history of popular music. <c42> He and fellow member Paul McCartney formed a much-celebrated songwriting partnership. <c43> Along with George Harrison and Ringo Starr, the group would ascend to world-wide fame during the 1960s. <c44> During his marriage to Cynthia, Lennon's first son Julian was born at the same time that his commitments with the Beatles were intensifying at the height of Beatlemania. <c45> Lennon was touring with the Beatles when Julian was born on 8 April 1963. <c46> Julian's birth, like his mother Cynthia's marriage to Lennon, was kept secret because Epstein was convinced that public knowledge of such things would threaten the Beatles' commercial success. <c47> Julian recalled that as a small child in Weybridge some four years later, \"I was trundled home from school and came walking up with one of my watercolour paintings. <c48> It was just a bunch of stars and this blonde girl I knew at school. <c49> And Dad said, 'What's this?' <c50> I said, 'It's Lucy in the sky with diamonds.'\" <c51> Lennon used it as the title of a Beatles song, and though it was later reported to have been derived from the initials LSD, Lennon insisted, \"It's not an acid song.\" <c52> McCartney corroborated Lennon's explanation that Julian innocently came up with the name. <c53> Lennon was distant from Julian, who felt closer to McCartney than to his father. <c54> During a car journey to visit Cynthia and Julian during Lennon's divorce, McCartney composed a song, \"Hey Jules\", to comfort him. <c55> It would evolve into the Beatles song \"Hey Jude\". <c56> Lennon later said, \"That's his best song. <c57> It started off as a song about my son Julian ... he turned it into 'Hey Jude'. <c58> I always thought it was about me and Yoko but he said it wasn't.\" <c59> Lennon's relationship with Julian was already strained, and after Lennon and Ono moved to Manhattan in 1971, Julian would not see his father again until 1973. <c60> With Pang's encouragement, arrangements were made for Julian (and his mother) to visit Lennon in Los Angeles, where they went to Disneyland. <c61> Julian started to see his father regularly, and Lennon gave him a drumming part on a Walls and Bridges track. <c62> He bought Julian a Gibson Les Paul guitar and other instruments, and encouraged his interest in music by demonstrating guitar chord techniques. <c63> Julian recalls that he and his father \"got on a great deal better\" during the time he spent in New York: \"We had a lot of fun, laughed a lot and had a great time in general.\" <c64> In a Playboy interview with David Sheff shortly before his death, Lennon said, \"Sean was a planned child, and therein lies the difference. <c65> I don't love Julian any less as a child. <c66> He's still my son, whether he came from a bottle of whiskey or because they didn't have pills in those days. <c67> He's here, he belongs to me, and he always will.\" <c68> He said he was trying to re-establish a connection with the then 17-year-old, and confidently predicted, \"Julian and I will have a relationship in the future.\" <c69> After his death it was revealed that he had left Julian very little in his will."},
]
messages_rewritten = [
{"role": "user", "content": "<c70> Who were the members of The Metal Ono Band, which was formed by Yoko Ono in 1976 to explore her interest in heavy metal music?"},
{"role": "assistant", "content": "<c71> I'm sorry, but I don't have the data to answer that specific question."},
{"role": "user", "content": "<c72> What was the concept behind the formation of the Plastic Ono Band?"},
{"role": "assistant", "content": "<r0> The Plastic Ono Band was formed by John Lennon and Yoko Ono in 1969 as a collaborative vehicle for their artistic and personal projects. <r1> They decided to credit all their future efforts to this conceptual and collaborative group after their marriage in 1969."},
{"role": "user", "content": "You provided the last assistant response above based on context, which may include documents and/or previous conversation turns. Your response is divided into sentences, numbered in the format <r0> sentence 0 <r1> sentence 1 ... Sentences in the context are also numbered: <c0> sentence 0 <c1> sentence 1 ... For each response sentence, please list the context sentences that were most important for you to generate the response sentence. Provide your answer in JSON format, as an array of JSON objects, where each object has two members: \"r\" with the response sentence number as the value, and \"c\" with an array of context sentence numbers as the value. An example of such an array of objects is [{\"r\": 0, \"c\": [3, 1, 4]}, {\"r\": 1, \"c\": [1, 5]}]. List the context sentences in order from most important to least important. Ensure that you include an object for each response sentence, even if the corresponding array of context sentence numbers is empty. Answer with only the JSON and do not explain.\n"},
]
# Call context attribution model to attribute each response sentence to context sentences
inputs = tokenizer.apply_chat_template(messages_rewritten, documents=documents_rewritten, add_generation_prompt=True, return_tensors="pt", return_dict=True).to(device)
output_tokens = model_ca.generate(**inputs, max_new_tokens=4096, do_sample=False)
output_text = tokenizer.decode(output_tokens[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(output_text)
# [{"r": 0, "c": [0, 19]}, {"r": 1, "c": [2, 0, 1, 19, 3, 72, 70, 4, 71, 21]}]
# This means that response sentence 0 is attributed to context sentences 0 and 19,
# and response sentence 1 is attributed to many context sentences in decreasing order of importance, starting with sentence 2.
# (You may get slightly different orderings of context sentences for response sentence 1, depending on your hardware or environment.)
# This output could be post-processed using `granite-common`, for example to display or highlight the sentence texts.
Evaluation
The context attribution adapter was evaluated on four datasets: MD2D-QUAC, ELI5, CNN/Daily Mail (CNN/DM), and XSum. The first two are document-grounded, multi-turn question answering datasets and the last two are summarization datasets. MD2D-QUAC evaluates the adpater on a held-out test split of the dataset used for training (please see "Training Data" below for more details), ELI5 evaluates generalization to a different dataset within the same task, and CNN/DM and XSum evaluate generalization to a different task.
The evaluation metric is the area under the perturbation curve (AUPC), a standard metric for evaluating the faithfulness of a feature-attribution-like explanation to the model being explained. Please see the MExGen paper for more details on AUPC. Here, the perturbation curves are first multiplied by a linearly decreasing weight function, which decreases from 1 at 0% perturbation to 0 at 20% perturbation, yielding the weighted area under the perturbation curve (WAUPC).
| method | MD2D-QUAC | ELI5 | CNN/DM | XSum | average |
|---|---|---|---|---|---|
| LOO (skyline) | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| LOO thresholded (realistic skyline) | 0.9780 | 0.9812 | 0.9976 | 0.9983 | 0.9888 |
| prompt 0-shot | 0.1476 | 0.1657 | 0.1629 | 0.2369 | 0.1783 |
| prompt 1-shot | 0.1982 | 0.2171 | 0.1693 | 0.2014 | 0.1965 |
| prompt GPT-OSS-120B 0-shot | 0.8619 | 0.8063 | 0.8735 | 0.6679 | 0.8024 |
| prompt GPT-OSS-120B 1-shot | 0.8779 | 0.8498 | 0.8996 | 0.8642 | 0.8729 |
| context attribution LoRA | 0.9360 | 0.9045 | 0.9158 | 0.9223 | 0.9197 |
The first row of the table shows the WAUPC for the leave-one-out (LOO) variant of the MExGen method, which the adapter aims to approximate. The second row corresponds to a thresholded version of LOO, which is the one that actually generates training data for the adapter. LOO and thresholded LOO thus represent "skyline" and "realistic skyline" methods that the adapter ideally would match. The WAUPC values in the table have been normalized by dividing by the WAUPC of LOO. As seen in the last row of the table, the adapter does well in approximating LOO, attaining at least 90% of its WAUPC on all datasets and 92.0% averaged across datasets.
The table compares the context attribution adapter to baselines that prompt an LLM to perform the context attribution task. These include prompting the granite-4.0-micro base model (labelled simply as "prompt" in the table), with and without providing an example (0-shot/1-shot), and prompting GPT-OSS-120B, also with 0-shot or 1-shot. (2-shot and 3-shot prompting were also evaluated but did not improve upon 1-shot, so are omitted from this table for brevity.) It is clear that the context attribution adapter greatly improves granite-4.0-micro's ability to attribute its own responses to context. The adapter also outperforms GPT-OSS-120B, an LLM with around 40X the number of parameters (the LoRA adapter contributes a negligible number of parameters to the granite-4.0-micro base model).
Training Details
Granite 4.0 Micro Context Attribution is a LoRA adapter trained to approximate the importance ranking of context sentences provided by the MExGen method in [Monteiro Paes and Wei et al., ACL 2025] Multi-Level Explanations for Generative Language Models.
Training Data: The adapter was trained on a mixture of two datasets, MultiDoc2Dial and QUAC, which are both for document-grounded, multi-turn question answering. This mixture of datasets, referred to as MD2D-QUAC, was also used to train IBM Granite's citation generation adapter. 2053 question-answer conversation rounds were used for training and 1024 for validation. These numbers of instances were deliberately chosen to be modest to demonstrate that a well-performing adapter can be trained using this limited amount of data.
The MultiDoc2Dial and QUAC datasets consist of sets of grounding documents and multi-turn question-answering conversations based on the set of documents. For training the context attribution adapter, each conversation round (consisting of one user question followed by an assistant response), together with any conversation rounds that precede it, was treated as a separate "instance". The context for each instance includes the documents for the conversation as well as all conversation rounds except the last round, which is treated as the current round for the instance. granite-4.0-micro was called to re-generate a response to the question in the current round (since the purpose of the adapter is to attribute granite-4.0-micro's responses). The leave-one-out (LOO) variant of MExGen was then used to attribute each response sentence to context sentences, yielding "gold" attribution scores. LOO is thus treated as the "skyline" method in the Evaluation section below. The LOO attribution scores were thresholded to produce a controlled-length list of context sentences, specifically a list of context sentence numbers in decreasing order of importance. The context attribution adapter was trained to reproduce these lists for all response sentences, given the context, response, and an instruction as input.
Adapter Configurations
| Parameter | LoRA |
|---|---|
| Base model | ibm-granite/granite-4.0-micro |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| Target modules | q_proj, k_proj, v_proj |
| Output format | e.g. [{"r": 0, "c": [3, 1, 4]}, ... ] where "c" lists most important context sentences for response sentence "r" |
| Max completion tokens | 4096 |
| KV cache | Supported |
Infrastructure: The Granite 4.0 Micro Context Attribution LoRA adapter was trained on a single NVIDIA A100-80GB GPU.
Ethical Considerations: The context attribution adapter is designed specifically for Granite 4.0 Micro and was trained on its behavior. While it may be applied to other LLMs, it has not been validated for them. In addition, the context attributions may not always align with human judgments of which context sentences should matter.
Resources
- ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite
- 📄 Get started with tutorials, best practices, and prompt engineering advice: https://www.ibm.com/granite/docs/
- 💡 Learn about the latest Granite learning resources: https://ibm.biz/granite-learning-resources