🐒 Monke-0.4B-v1.1

Monke-0.4B is a custom 370M parameter language model trained from scratch to bridge the gap between general chat and Python coding capabilities.

It was trained as an educational experiment to demonstrate efficient training on consumer hardware (T4 GPU) using a hybrid dataset strategy.

💻 Model Details

Architecture: Custom Llama-style architecture (370M parameters)
Training Steps: 3,050 steps
Final Loss: ~2.57
Context Window: 1024 tokens
Training Hardware: Single NVIDIA T4 (Google Colab Free Tier)
Developer: @aaravriyer193

🚀 Capabilities

Monke-0.4B is a "Hybrid Coder." It is designed to:

Write Python Code: Can generate simple functions, loops, and standard library usage.
Chat: Can handle basic conversational queries and explain concepts.
Switch Modes: Trained on a 50/50 mix of instruct-chat and raw code, allowing it to understand both natural language and programming syntax.

📦 How to Use

You can run Monke directly in Python using the transformers library:

from transformers import AutoTokenizer, AutoModelForCausalLM

# 1. Load Monke
model_name = "aaravriyer193/monke-0.4b-v1.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# 2. Ask it to code
prompt = "def fibonacci(n):"
inputs = tokenizer(prompt, return_tensors="pt")

# 3. Generate
outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

⚠️ Limitations

Size: At ~400M parameters, this model is significantly smaller than Llama-3 (8B) or GPT-4. It lacks deep reasoning capabilities.
Hallucinations: It may confidently invent Python libraries that do not exist or state incorrect facts.
Logic: While it understands syntax (grammar), it may struggle with complex logic puzzles or multi-step math problems.

📜 Training Data

The model was trained on a custom curated mix of:

50% General Instruct/Chat data (Alpaca style)
50% Python Code snippets (The Stack/CodeSearchNet)

Built with ❤️ and a single GPU.

Downloads last month: 14

Safetensors

Model size

0.4B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support