πŸ’ Monke-0.4B-v1.1

Monke-0.4B is a custom 370M parameter language model trained from scratch to bridge the gap between general chat and Python coding capabilities.

It was trained as an educational experiment to demonstrate efficient training on consumer hardware (T4 GPU) using a hybrid dataset strategy.

πŸ’» Model Details

  • Architecture: Custom Llama-style architecture (370M parameters)
  • Training Steps: 3,050 steps
  • Final Loss: ~2.57
  • Context Window: 1024 tokens
  • Training Hardware: Single NVIDIA T4 (Google Colab Free Tier)
  • Developer: @aaravriyer193

πŸš€ Capabilities

Monke-0.4B is a "Hybrid Coder." It is designed to:

  1. Write Python Code: Can generate simple functions, loops, and standard library usage.
  2. Chat: Can handle basic conversational queries and explain concepts.
  3. Switch Modes: Trained on a 50/50 mix of instruct-chat and raw code, allowing it to understand both natural language and programming syntax.

πŸ“¦ How to Use

You can run Monke directly in Python using the transformers library:

from transformers import AutoTokenizer, AutoModelForCausalLM

# 1. Load Monke
model_name = "aaravriyer193/monke-0.4b-v1.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# 2. Ask it to code
prompt = "def fibonacci(n):"
inputs = tokenizer(prompt, return_tensors="pt")

# 3. Generate
outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

⚠️ Limitations

  • Size: At ~400M parameters, this model is significantly smaller than Llama-3 (8B) or GPT-4. It lacks deep reasoning capabilities.
  • Hallucinations: It may confidently invent Python libraries that do not exist or state incorrect facts.
  • Logic: While it understands syntax (grammar), it may struggle with complex logic puzzles or multi-step math problems.

πŸ“œ Training Data

The model was trained on a custom curated mix of:

  • 50% General Instruct/Chat data (Alpaca style)
  • 50% Python Code snippets (The Stack/CodeSearchNet)

Built with ❀️ and a single GPU.

Downloads last month
14
Safetensors
Model size
0.4B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using aaravriyer193/monke-0.4b-v1.1 1