XLM-RoBERTa Base ONNX (INT8 Quantized)
INT8 quantized ONNX export of xlm-roberta-base for efficient on-device inference.
Model Details
- Base model: xlm-roberta-base (278M params)
- Format: ONNX
- Quantization: INT8 dynamic quantization
- Size: ~509MB (vs ~1.1GB unquantized)
Usage
This model is designed for use with LoRA adapters applied at runtime. The weights are exported with do_constant_folding=False to allow in-memory patching.
Files
xlm_roberta_base_int8.onnx- INT8 quantized ONNX model
Export Details
Exported using:
- transformers
- optimum
- onnxruntime
With dynamic INT8 quantization for reduced model size while maintaining accuracy.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support