Arpit Ramesan's Portfolio AI Agent (SmolLM2-360M)

Model Description

This is a custom-trained conversational AI agent designed to act as the personal representative for Arpit Ramesan, an AI/ML Engineer and aspiring AI Generalist based in Thalassery, Kerala.

The model was fine-tuned using QLoRA on a synthetic dataset generated specifically from Arpit's real-world portfolio, resume, and technical blogs. It is optimized to answer questions regarding his career timeline, technical architecture, and creative hubs, using a professional yet conversational tone.

  • Developed by: Arpit Ramesan (@mrplexar)
  • Model type: Causal Language Model (Fine-tuned via PEFT/LoRA and merged)
  • Base model: HuggingFaceTB/SmolLM2-360M-Instruct
  • Language(s): English (with internet slang/casual conversational capabilities)
  • License: Apache 2.0

Intended Uses & Limitations

Intended Use

This model is specifically designed to be deployed as the backend inference engine for a web-based portfolio OS. It expects a strict ChatML-style prompt structure and a specific system prompt to reliably recall its training.

Limitations

  • Scope Context: It is a "Micro-LLM" (360M parameters). While highly efficient, it is strictly domain-specific. Asking it general knowledge questions outside the scope of Arpit's portfolio may result in hallucinations or base-model bleed-through.
  • Prompt Sensitivity: The model requires greedy decoding (temperature=0.01, do_sample=False) and the exact system prompt from its training to avoid identity drift.

How to Get Started with the Model

Use the code below to run inference using the Hugging Face InferenceClient, which requires zero local VRAM and is optimized for lightweight web servers like PythonAnywhere.

from huggingface_hub import InferenceClient

REPO_ID = "mrplexar/PortfolioLLM-RAG"
HF_TOKEN = "your_hf_read_token_here"

client = InferenceClient(model=REPO_ID, token=HF_TOKEN)

SYSTEM_PROMPT = "You are the AI agent for Arpit Ramesan's portfolio. You answer questions about his career, creative projects, and technical skills using a professional, personal, and conversational tone. You know that Arpit is based in Thalassery, Kerala."
user_msg = "What kind of projects does Arpit build?"

# Apply exact training template
prompt = f"<|im_start|>user\n{SYSTEM_PROMPT}\n\n{user_msg}<|im_end|>\n<|im_start|>assistant\n"

reply = client.text_generation(
    prompt,
    max_new_tokens=150,
    temperature=0.01,
    stop_sequences=["<|im_end|>"]
)

print(reply.strip())

Training Details

Training Data

The model was fine-tuned on a custom JSONL dataset containing 150+ diverse, synthetic Q&A pairs. The dataset was generated using an automated pipeline powered by gemma4:e2b. The data encompasses:

Personal Biometrics & Timeline (Education at St. Thomas College of Engineering, ML Engineer at Incramania, Intern Lead at Saspo World Tech).

Technical Projects (WhatsApp DDoS Defense, Theyyam Detection System, Brain Disease Detection).

Live Deployments & Web Architecture.

Creative Hubs (YouTube, Spotify Podcasts, Pinterest).

Conversational augmentations (internet slang, casual formatting, varied query lengths).

Training Procedure

The model was trained using QLoRA (Quantized Low-Rank Adaptation) to accommodate hardware constraints (NVIDIA GTX 1650 4GB VRAM) while maximizing parameter updates. The LoRA adapters were subsequently fused into the base model weights for seamless API deployment.

Training Hyperparameters

  • Precision: 4-bit (nf4) quantization during training, upcast to float16.

  • Batch Size: 1 (with Gradient Accumulation steps = 8).

  • Optimizer: paged_adamw_32bit.

  • Learning Rate: 2e-4 with a cosine learning rate scheduler.

  • Max Steps: 200

  • Target Modules: ["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"]

  • LoRA Rank (r): 16

  • LoRA Alpha: 32

Hardware & Environment

Hardware Type: NVIDIA GTX 1650 (Mobile)

VRAM Required: ~3.2 GB during QLoRA training

Frameworks: transformers, peft, trl, bitsandbytes, torch

Downloads last month
-
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mrplexar/PortfolioLLM-RAG

Adapter
(31)
this model