Model Card for Model ID

Phi-4 Mini African History (GGUF)

This repository contains GGUF versions of the Phi-4-mini-instruct base model and the LoRAfrica adapter, specialized for African History QA.

Base Model: phi4_mini_instruct_q4_k_m.gguf
LoRA Adapter: lorafrica_lora_adapter.gguf
Training Data: DannyAI/African-History-QA-Dataset

Model Details

Model Description

This model is a quantized GGUF version of a LoRA fine-tuning based on Microsoft's Phi-4-mini. It is specifically optimized for efficiency and accuracy when answering questions regarding African history, ranging from ancient empires to contemporary social movements.

Developed by: Daniel Ihenacho
Funded by: Daniel Ihenacho
Shared by: Daniel Ihenacho
Model type: Text Generation (GGUF)
Language(s) (NLP): English
License: MIT
Finetune Method: Axolotl (LoRA)

Uses

This model is designed for low-latency inference on consumer hardware using llama.cpp or other GGUF-compatible backends (Ollama, LM Studio, etc.).

Primary Use Case

QA datasets and educational tools focused on African History.

Out-of-Scope Use

General-purpose coding or mathematical reasoning (performance may be degraded compared to the base Phi-4).
Production environments requiring high-stakes factual guarantees without human oversight.

How to Get Started with the Model

Since this repository contains a separate GGUF base and a GGUF LoRA adapter, you can use them together in llama.cpp without needing to merge them permanently.

Downloading the models

hf download DannyAI/LoRAfrica_GGUF  --local-dir ./gguf_model

Using llama-cli

NB: This is for CMD on Windows

.\build\bin\Release\llama-cli -m phi4_mini_instruct_q4_k_m.gguf ^
 --lora lorafrica_lora_adapter.gguf ^
 -p "<|system|>You are a helpful AI assistant specialised in African history.<|end|><|user|>Briefly detail the significance story of the Igbo god Amadioha?<|end|><|assistant|>" ^
 -n 128

Using llama-cli-server (Preferably)

NB: This is for CMD on Windows

.\build\bin\Release\llama-server -m phi4_mini_instruct_q4_k_m.gguf ^
 --lora lorafrica_lora_adapter.gguf --host 0.0.0.0 --port 8080

You could also navigate into the url, by pasting it on your browser http://localhost:8080/v1/chat/completions

It springs up the chat interface for you to use instead of using CMD to send request after starting the server The server runs on your local machine

# Send a Request
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d "{\"messages\": [{\"role\": \"system\", \"content\": \"You are a helpful AI assistant specialised in African history.\"}, {\"role\": \"user\", \"content\": \"Briefly detail the significance story of the Igbo god Amadioha?\"}], \"max_tokens\": 128, \"temperature\": 0.1}"

NB: For Deeper details refer to the GitHub link

Python Implementation (via llama-cpp-python)

from llama_cpp import Llama
import os
# Get the number of physical cores on your machine
cores = os.cpu_count()

# For example, if you have 8 cores, 4 or 6 is usually the sweet spot.
cores_to_use = cores // 2 
print(f"Cores on this machine: {cores}")
print(f"Cores to use: {cores_to_use}")

# Load the base model
llm = Llama(
    model_path="./phi4_mini_instruct_q4_k_m.gguf",
    lora_path="./lorafrica_lora_adapter.gguf", # comment out if you do not want LoRA Adapters
    n_ctx=2000,
    n_threads=cores,   # Increase this for faster generation!
    n_batch=512,       # Helps the CPU process the initial prompt faster
    n_gpu_layers=0,     # Explicitly disables GPU offloading
    use_mmap=True,     # ADD THIS: Critically important for low-RAM systems
    use_mlock=False    # Ensure this is False so it doesn't force-lock RAM
)

output = llm.create_chat_completion(
    messages =[
        {
            "role": "system", "content":"You are a helpful AI assistant specialised in African history which gives concise answers to questions asked."
        },
        {
            "role": "user", "content":"Briefly detail the significance story of the Igbo god Amadioha?"
        }
    ],
    max_tokens=128,
    temperature=0.1
)

print(output["choices"][0]["message"]["content"])

# Example Output
Amadioha is a significant deity in Igbo mythology, representing justice and the thunder god. He is believed to reside in the heavens and is often invoked to settle disputes and mete out punishment for wrongdoing. Amadioha's thunderbolts are said to strike down those who commit injustices, symbolizing divine retribution. His presence underscores the importance of justice and moral order in Igbo culture.

# for streaming tokens generated

output_stream = llm.create_chat_completion(
    messages =[
        {
            "role": "system", "content":"You are a helpful AI assistant specialised in African history which gives concise answers to questions asked."
        },
        {
            "role": "user", "content":"Briefly detail the significance story of the Igbo god Amadioha?
        }
    ],
    max_tokens=128,
    temperature=0.1,
    stream=True
)


for chunk in output_stream:
    delta = chunk["choices"][0]["delta"]
    if "content" in delta:
        print(delta["content"], end="", flush=True)

Ollama

Here is the link to the Ollama Model File

Ollama Model File

Citation

@Model{
Ihenacho2026phi4_african_history_lora_gguf,
  author    = {Daniel Ihenacho},
  title     = {Phi-4 African History LoRA GGUF},
  year      = {2026},
  publisher = {Hugging Face Models},
  url       = {https://huggingface.co/DannyAI/LoRAfrica_GGUF},
}

Model Card Authors

Daniel Ihenacho

Model Card Contact

Downloads last month: 682

GGUF

Model size

1.57M params

Architecture

phi3

Hardware compatibility

4-bit

View +1 variant

Model tree for DannyAI/LoRAfrica_GGUF

Base model

microsoft/Phi-4-mini-instruct

Adapter

(171)

this model

DannyAI
/

LoRAfrica_GGUF