Llama-3.1-8B-Instruct-OpenVINO-INT4 (Platinum Series)

Status Format Series Support

This repository contains the Platinum Series OpenVINO INT4 release of Llama-3.1-8B-Instruct. This export leverages Mixed-Precision Weight Compression and Data-Free AWQ to maintain the massive 128k context window while significantly reducing the memory footprint for edge deployment.

πŸ’Ž Optimization Details

Based on our NNCF (Neural Network Compression Framework) diagnostic:

  • Mixed-Precision Strategy: 70% (156/224) of ratio-defining parameters are compressed to INT4_ASYM (group size 64) for maximum speed.
  • Accuracy Preservation: 30% (68/224) of parameters remain in INT8_ASYM (per-channel) to protect critical attention and normalization layers.
  • AWQ Calibration: Applied data-free AWQ to minimize quantization error across 32 key blocks.

🐍 Python Inference (OpenVINO GenAI)

from openvino_genai import LLMPipeline

# Load the Platinum Engine
pipe = LLMPipeline("Llama-3.1-8B-Instruct-OpenVINO-INT4", "CPU")

# 128k context ready for 2026 RBI Directions
print(pipe.generate("Analyze the 2026 RBI Internal Ombudsman Directions.", max_new_tokens=512))

πŸ’» C# / .NET Users (OpenVINO.GenAI.CSharp)

This collection is fully compatible with .NET applications via the OpenVINO.GenAI C# API, ideal for integrating into corporate tools.

using OpenVino.GenAI;

// Initialize the Platinum Engine
var pipeline = new LLMPipeline("Llama-3.1-8B-Instruct-OpenVINO-INT4", "CPU");

// Generate reasoning for Indian Finance
var result = pipeline.Generate("Explain the 2026 RBI regulatory framework.", max_new_tokens: 512);
Console.WriteLine(result);

πŸ—οΈ Technical Forge

  • Optimization Tool: optimum-cli / NNCF (2026-03-29)
  • Bitwidth Distribution: 70% INT4 / 30% INT8 Mixed-Precision
  • Calibration: Data-Free AWQ (32 steps)
  • Workstation: Dual-GPU (NVIDIA RTX 3090 24GB + RTX A4000 16GB)
  • Infrastructure: S: NVMe Scratch / K: 12TB Warehouse

β˜• Support the Forge

Maintaining the production line for high-fidelity models requires significant hardware resources. If these tools power your research or industrial projects, please consider supporting the development:

Platform Support Link
Global & India Support via Razorpay

Scan to support via UPI (India Only):


Connect with the architect: Abhishek Jaiswal on LinkedIn

Downloads last month
53
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for CelesteImperia/Llama-3.1-8B-Instruct-OpenVINO-INT4

Finetuned
(2584)
this model