XLM-RoBERTa Base ONNX (INT8 Quantized)

INT8 quantized ONNX export of xlm-roberta-base for efficient on-device inference.

Model Details

Base model: xlm-roberta-base (278M params)
Format: ONNX
Quantization: INT8 dynamic quantization
Size: ~509MB (vs ~1.1GB unquantized)

Usage

This model is designed for use with LoRA adapters applied at runtime. The weights are exported with do_constant_folding=False to allow in-memory patching.

Files

xlm_roberta_base_int8.onnx - INT8 quantized ONNX model

Export Details

Exported using:

transformers
optimum
onnxruntime

With dynamic INT8 quantization for reduced model size while maintaining accuracy.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support