πŸ”© AMINI Q4_K_M GGUF β€” Tushe Foundry Edge Inference Pack

Quantised & packaged by Tushe – The Foundry Research Team


🧭 What Is This?

We are Tushe – The Foundry Research Team, and we are building a bare-metal inference engine for African language AI on constrained hardware.

This repository is part of our open-source model-baked-on-metal-inference-engine package β€” a lightweight inference runtime we are releasing as:

Format Use case
🐍 Python library (pip install tushe-bare-metal) Server, Raspberry Pi, edge Linux
πŸ“¦ npm package (npm install tushe-bare-metal) Node.js apps, Electron, React Native
βš™οΈ Compiled C executable Bare-metal embedded, IoT, MCUs

Every developer can drop this into their app and run offline African language inference right away β€” no internet, no cloud, no GPU required.


🎯 Why We Built This

Africa has some of the most resource-constrained connectivity environments in the world. Millions of people β€” rural doctors, farmers, teachers, students, traders, and tourists β€” need intelligent language tools but have no reliable internet access.

We took N-ATLaS, the Llama 3 8B model fine-tuned on Nigerian and African languages by Awarri Technologies in collaboration with NCAIR, and quantised it with immense optimizations to run on low-resource hardware including:

  • πŸ“± Android & iOS phones
  • 🌾 Edge IoT devices in agricultural fields
  • πŸ₯ Offline clinical/medical support tools in rural clinics
  • 🏫 Classrooms with no internet access
  • 🧭 Portable translator devices for traders and tourists

🌍 Target Use Cases

Domain Description
πŸ₯ Rural & Edge Medical Doctors and health workers in remote clinics β€” symptom triage, patient communication, drug info in local languages
🌾 Farmers Support Modern and rural farmers β€” crop advice, weather interpretation, market prices, pest identification in Hausa, Igbo, Yoruba
🏫 Education Teachers and students in areas without internet β€” explanations, tutoring, literacy support in local languages
πŸ›’ Traders & Markets Cross-language communication for traders and informal markets across Africa
✈️ Tourists Real-time offline translation across African languages

πŸ”© About the Base Model

This GGUF is derived from NCAIR1/N-ATLaS β€” An open-source multilingual LLM, built on Llama 3 8B, fine-tuned by Awarri Technologies in collaboration with the National Centre for Artificial Intelligence & Robotics (NCAIR) and the Federal Ministry of Communications, Innovation and Digital Economy of Nigeria.

N-ATLaS was trained on approximately ~392 million multilingual tokens spanning English, Hausa, Igbo, and Yoruba.

We did not change the weights. We quantised the original model and build a highly optimized efficient inference engine to enable edge deployment.


πŸ’Ύ This File

File Quant Size Min RAM
AMINI-q4_k_m.gguf Q4_K_M ~4.5 GB 6–8 GB

Q4_K_M is our recommended quant for edge deployment β€” best balance of accuracy, speed, and memory. Runs on a phone with 8GB RAM or a Raspberry Pi 5.


πŸš€ Inference Examples

1. Python β€” llama-cpp-python

pip install llama-cpp-python
from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id      = "AlaminI/AMINI-ASSISTANT-GGUF-Q4-B",
    filename     = "*.gguf",
    n_ctx        = 2048,
    n_gpu_layers = 0,      # 0 = CPU only (edge/offline), -1 = full GPU
    verbose      = False,
)

English β€” rural medical support:

output = llm(
    "A patient presents with fever, headache, and joint pain for 3 days. What are the possible diagnoses and first-line management?",
    max_tokens  = 256,
    temperature = 0.7,
    echo        = False,
)
print(output["choices"][0]["text"])

Hausa β€” farmer support:

output = llm(
    "Gonar hatsi na da kwari da yawa. Menene zan iya yi don kare amfanin gona na?",
    max_tokens  = 256,
    temperature = 0.7,
    echo        = False,
)
print(output["choices"][0]["text"])

Yoruba β€” student support:

output = llm(
    "Ṓe alaye ohun ti photosynthesis jẹ ni ede Yoruba fun ọmọ ile-iwe.",
    max_tokens  = 256,
    temperature = 0.7,
    echo        = False,
)
print(output["choices"][0]["text"])

Igbo β€” trader/market support:

output = llm(
    "Gwa m ọnα»₯ ahα»‹a nke ọka ugbu a n'ahα»‹a Onitsha.",
    max_tokens  = 256,
    temperature = 0.7,
    echo        = False,
)
print(output["choices"][0]["text"])

2. Chat format β€” multilingual instruction

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id      = "AlaminI/AMINI-ASSISTANT-GGUF-Q4-B",
    filename     = "*.gguf",
    n_ctx        = 2048,
    n_gpu_layers = 0,
    verbose      = False,
)

response = llm.create_chat_completion(
    messages = [
        {
            "role": "system",
            "content": (
                "You are an offline African language assistant running on a local device. "
                "You support English, Hausa, Igbo, and Yoruba. "
                "Respond in the same language the user writes in. "
                "Be concise β€” this device has limited resources."
            )
        },
        {
            "role": "user",
            "content": "Translate 'The child has a high fever and needs immediate care' into Hausa and Yoruba."
        }
    ],
    max_tokens  = 256,
    temperature = 0.7,
)
print(response["choices"][0]["message"]["content"])

3. Streaming (for responsive UIs on edge devices)

stream = llm.create_chat_completion(
    messages = [
        {"role": "user", "content": "Explain crop rotation to a farmer in Hausa."}
    ],
    max_tokens = 256,
    temperature = 0.7,
    stream     = True,
)

for chunk in stream:
    delta = chunk["choices"][0]["delta"].get("content", "")
    print(delta, end="", flush=True)

4. Node.js β€” node-llama-cpp

npm install node-llama-cpp
import { getLlama, LlamaChatSession } from "node-llama-cpp";
import path from "path";

const llama   = await getLlama();
const model   = await llama.loadModel({ modelPath: path.join("models", "AMINI-q4_k_m.gguf") });
const context = await model.createContext({ contextSize: 2048 });
const session = new LlamaChatSession({ contextSequence: context.getSequence() });

const response = await session.prompt(
    "A farmer asks: my tomatoes are wilting despite regular watering. What could be wrong?",
    { maxTokens: 256 }
);
console.log(response);

5. llama.cpp CLI (bare-metal / embedded)

# Download
huggingface-cli download AlaminI/AMINI-ASSISTANT-GGUF-Q4-B \
    AMINI-q4_k_m.gguf --local-dir ./models/

# Run on CPU only (edge device)
./llama-cli -m ./models/AMINI-q4_k_m.gguf \
    --ctx-size 2048 \
    --threads 4 \
    --temp 0.7 \
    -i -r "User:" \
    -p "You are an offline assistant for African languages. Respond in the user's language.\nUser:"

6. Ollama (local server mode)

ollama run hf.co/AlaminI/AMINI-ASSISTANT-GGUF-Q4-B

βš™οΈ Recommended Settings for Edge Devices

Parameter Value Notes
n_ctx 512–1024 Reduce on very low RAM devices
n_gpu_layers 0 CPU-only for phones/IoT
n_threads 4 Match your device's core count
temperature 0.7 Balanced responses
max_tokens 128–256 Keep short for low-latency UX
repeat_penalty 1.1 Reduces looping on edge

πŸ–₯️ Tested Hardware Targets

Device RAM n_gpu_layers Speed (tok/s)
Raspberry Pi 5 8 GB 0 (CPU) ~2–4 tok/s
Android phone (8GB) 8 GB 0 (CPU) ~3–6 tok/s
Laptop (no GPU) 16 GB 0 (CPU) ~8–15 tok/s
Laptop (GPU 6GB) 16 GB -1 (GPU) ~30–60 tok/s

πŸ”— Related Repositories

Repo Description
NCAIR1/N-ATLaS Original model by Awarri / NCAIR
AlaminI/AMINI-ASSISTANT-GGUF-Q4-B F16 GGUF (full precision, for re-quantising)

⚠️ License

This GGUF quantisation is an independent contribution by Tushe – The Foundry Research Team, We enncorage developers to refer to the N-ATLaS licence. But our Inference engine ca be used for any mean, commercial and beyond any user Number. We will rellease models traind by us to give developers fullly open-source models and inference at edge


Downloads last month
8
GGUF
Model size
8B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AlaminI/AMINI-ASSISTANT-GGUF-Q4-B

Base model

NCAIR1/N-ATLaS
Quantized
(4)
this model