The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines
Paper • 2408.01050 • Published • 9
Inference engines, quantization, serving stacks, and perf tooling. Reference list for deployment and latency/cost work.
Explore LLM benchmark trends over time
VLMEvalKit Evaluation Results Collection
VLMEvalKit Eval Results in video understanding benchmark