llm-calibration-benchmark

Runtime error

Apply for a GPU community grant: Academic project

by linweitao - opened 22 days ago

GPU Grant Request — LLM Calibration Benchmark

Who I am: I'm Linwei Tao, a PhD student at the University of Sydney researching calibration and uncertainty estimation in Large Vision-Language Models (LVLMs).

What this Space does: This Space hosts an interactive calibration benchmark for LVLMs — it allows researchers to evaluate and compare the confidence calibration of models like LLaVA, InstructBLIP, and Qwen-VL on standard multimodal QA benchmarks (VQAv2, POPE, MMBench). It computes Expected Calibration Error (ECE), reliability diagrams, and confidence histograms.

Why I need GPU compute: Running calibration evaluation on large vision-language models (7B–13B parameters) requires significant GPU memory. The current CPU-only setup is too slow for interactive use — inference on a single model over a benchmark subset takes hours on CPU but would take minutes on a T4/L4 GPU. A GPU grant would make this tool genuinely usable for the research community.

Impact: This is an open-science project. All code is public, and the benchmark is designed to help other researchers quickly audit the calibration of their own models. I plan to publish the methodology at a top-tier venue (NeurIPS/CVPR) and will acknowledge Hugging Face in the paper and Space README.

Institution: University of Sydney, School of Computer Science
Contact: linwei.tao@sydney.edu.au
Personal site: https://www.taolinwei.com

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment