YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Agri-llama: Official Vision-Language Model by Aqib Mehedi

Agri-llama is a powerful multimodal model designed for advanced agricultural analysis, combining visual perception with sophisticated language reasoning. It is optimized for both precision vision tasks and interactive agricultural consultation.

🌟 Key Features

  • Vision-Language Integration: Analyze agricultural imagery (crops, pests, soil) alongside textual queries.
  • High Performance: Optimized for efficiency and accuracy in specialized domains.
  • Multimodal Chat: Interactive dialogue support for complex agricultural problem-solving.
  • Flexible Deployment: Available in both Hugging Face Safetensors (FP16) and GGUF (Quantized) formats.

πŸ’» System Configuration (Development Environment)

This model was developed and verified on the following configuration:

  • OS: Microsoft Windows 11 Pro (Build 26200)
  • GPU: NVIDIA GeForce RTX 3060 (12GB VRAM)
  • CUDA: 13.0
  • Driver: 581.29
  • Python: 3.11.9

Recommended Hardware for Inference

  • GGUF (Quantized): 8GB+ VRAM (Full GPU offloading) or 16GB+ System RAM (CPU only).
  • Safetensors (FP16): 12GB+ VRAM (NVIDIA RTX 3060 or better recommended).

πŸš€ Beginner's Quick Start Guide

Follow these steps to get Agri-llama running on your local machine.

1. Prerequisites

Ensure you have Python 3.10+ and Git installed. If you have an NVIDIA GPU, install the CUDA Toolkit.

2. Setup Environment

Open your terminal (PowerShell or Command Prompt) and run:

# Clone the repository
git clone https://huggingface.co/aqibcareer007/agri-llama
cd agri-llama

# Create a virtual environment (recommended)
python -m venv venv
venv\Scripts\activate

# Install requirements
pip install -r requirements.txt

3. Running the Model

Option A: Interactive Chat (Easiest for Beginners)

This uses the quantized GGUF model which is fast and memory-efficient.

python scripts/run_chat.py

Wait for the πŸ€– Model Loaded! message and start typing your agricultural questions.

Option B: Vision Analysis (Python API)

Use this for programmatically analyzing images.

import torch
from transformers import AutoProcessor, AgriLlamaForConditionalGeneration
from PIL import Image

model_id = "aqibcareer007/agri-llama"

# Load Model
model = AgriLlamaForConditionalGeneration.from_pretrained(
    model_id, 
    torch_dtype=torch.float16, 
    device_map="auto"
)
processor = AutoProcessor.from_pretrained(model_id)

# Example: Analyze a crop image
image = Image.open("path_to_your_crop_image.jpg")
prompt = "<bos><start_of_turn>user\n<image>\nWhat is wrong with this leaf?<end_of_turn>\n<start_of_turn>model\n"

inputs = processor(text=prompt, images=image, return_tensors="pt").to("cuda")
output = model.generate(**inputs, max_new_tokens=200)

print(processor.decode(output[0], skip_special_tokens=True))

πŸ›  Technical Specifications

  • Model Type: Multimodal (Vision + Text)
  • Base Architecture: 4B Parameters
  • Context Window: 131,072 tokens
  • Quantization: GGUF Q4_K_M (Included for efficiency)

πŸ“„ License

[Insert License Here - e.g., Apache 2.0]

🀝 Acknowledgments

Developed by Aqib Mehedi, Senior AI Engineer at Kamal-Paterson Ltd. For support or inquiries, please visit the official Hugging Face repository.

Downloads last month
94
Safetensors
Model size
4B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support