CoE-Wiki-CoE-8B

CoE-Wiki-CoE-8B is an 8B vision-language checkpoint fine-tuned for Chain-of-Evidence question answering on Wiki-CoE. Given a question and candidate evidence screenshots, the model is trained to produce a structured answer with an evidence chain.

This checkpoint is intended for research on multimodal QA, visual evidence selection, and evidence-grounded reasoning over document-like screenshots.

Expected input and output

The model expects:

  • a natural-language question
  • candidate screenshot images that may contain the supporting evidence

The expected output is a JSON-style response with:

  • evidence_chain: the selected supporting screenshots and localized evidence
  • answer: the final answer

For exact prompt formatting and evaluation scripts, see the project code.

Usage

from transformers import AutoProcessor, AutoModelForImageTextToText
import torch

model_id = "PeiyangLiu/CoE-Wiki-CoE-8B"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

Use the same image preprocessing and prompt format as the CoE repository for reproducible results.

Related resources

Downloads last month
52
Safetensors
Model size
770k params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PeiyangLiu/CoE-Wiki-CoE-8B

Quantizations
1 model

Paper for PeiyangLiu/CoE-Wiki-CoE-8B