Agri-llama: Official Vision-Language Model by Aqib Mehedi

Agri-llama is a powerful multimodal model designed for advanced agricultural analysis, combining visual perception with sophisticated language reasoning. It is optimized for both precision vision tasks and interactive agricultural consultation.

🌟 Key Features

Vision-Language Integration: Analyze agricultural imagery (crops, pests, soil) alongside textual queries.
High Performance: Optimized for efficiency and accuracy in specialized domains.
Multimodal Chat: Interactive dialogue support for complex agricultural problem-solving.
Flexible Deployment: Available in both Hugging Face Safetensors (FP16) and GGUF (Quantized) formats.

💻 System Configuration (Development Environment)

This model was developed and verified on the following configuration:

OS: Microsoft Windows 11 Pro (Build 26200)
GPU: NVIDIA GeForce RTX 3060 (12GB VRAM)
CUDA: 13.0
Driver: 581.29
Python: 3.11.9

Recommended Hardware for Inference

GGUF (Quantized): 8GB+ VRAM (Full GPU offloading) or 16GB+ System RAM (CPU only).
Safetensors (FP16): 12GB+ VRAM (NVIDIA RTX 3060 or better recommended).

🚀 Beginner's Quick Start Guide

Follow these steps to get Agri-llama running on your local machine.

1. Prerequisites

Ensure you have Python 3.10+ and Git installed. If you have an NVIDIA GPU, install the CUDA Toolkit.

2. Setup Environment

Open your terminal (PowerShell or Command Prompt) and run:

# Clone the repository
git clone https://huggingface.co/aqibcareer007/agri-llama
cd agri-llama

# Create a virtual environment (recommended)
python -m venv venv
venv\Scripts\activate

# Install requirements
pip install -r requirements.txt

3. Running the Model

Option A: Interactive Chat (Easiest for Beginners)

This uses the quantized GGUF model which is fast and memory-efficient.

python scripts/run_chat.py

Wait for the 🤖 Model Loaded! message and start typing your agricultural questions.

Option B: Vision Analysis (Python API)

Use this for programmatically analyzing images.

import torch
from transformers import AutoProcessor, AgriLlamaForConditionalGeneration
from PIL import Image

model_id = "aqibcareer007/agri-llama"

# Load Model
model = AgriLlamaForConditionalGeneration.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_id)

# Example: Analyze a crop image
image = Image.open("path_to_your_crop_image.jpg")
prompt = "<bos><start_of_turn>user\n<image>\nWhat is wrong with this leaf?<end_of_turn>\n<start_of_turn>model\n"

inputs = processor(text=prompt, images=image, return_tensors="pt").to("cuda")
output = model.generate(**inputs, max_new_tokens=200)

print(processor.decode(output[0], skip_special_tokens=True))

🛠 Technical Specifications

Model Type: Multimodal (Vision + Text)
Base Architecture: 4B Parameters
Context Window: 131,072 tokens
Quantization: GGUF Q4_K_M (Included for efficiency)

📄 License

[Insert License Here - e.g., Apache 2.0]

🤝 Acknowledgments

Developed by Aqib Mehedi, Senior AI Engineer at Kamal-Paterson Ltd. For support or inquiries, please visit the official Hugging Face repository.

Downloads last month: 94

Safetensors

Model size

4B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support