XLM-RoBERTa Base ONNX (INT8 Quantized)

INT8 quantized ONNX export of xlm-roberta-base for efficient on-device inference.

Model Details

  • Base model: xlm-roberta-base (278M params)
  • Format: ONNX
  • Quantization: INT8 dynamic quantization
  • Size: ~509MB (vs ~1.1GB unquantized)

Usage

This model is designed for use with LoRA adapters applied at runtime. The weights are exported with do_constant_folding=False to allow in-memory patching.

Files

  • xlm_roberta_base_int8.onnx - INT8 quantized ONNX model

Export Details

Exported using:

  • transformers
  • optimum
  • onnxruntime

With dynamic INT8 quantization for reduced model size while maintaining accuracy.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support