๐Ÿ” Kerala Crime Detective โ€” Malayalam + English + Manglish AI

Solve crimes the Kerala way โ€” Comedy, Manglish, and serious detective work, all in one model!

A fine-tuned Gemma 3 1B that understands Kerala crime reports, FIR details, and cyber fraud cases โ€” and responds in Malayalam, English, or Manglish with comedy and serious investigation steps.


๐ŸŽฏ What This Model Does

Mode Description
๐ŸŽญ Malayalam Comedy Solves crimes with Manglish humor, Kerala cultural references, local jokes
๐Ÿ” Serious English Professional CID-style investigation โ€” evidence, suspects, legal sections
๐ŸŒ Cyber Crime Expert Specialized in UPI fraud, SIM swap, sextortion, fake jobs, investment scams
๐ŸŽญ+๐Ÿ” Mixed Style Comedy + serious advice combined โ€” most popular mode

๐Ÿš€ Try the Live Demo

๐Ÿ‘‰ Open in HuggingFace Spaces


๐Ÿ“ฆ Model Details

Property Value
Base Model google/gemma-3-1b-it
Fine-tuning Method Supervised Fine-Tuning (SFT) with TRL
Training Framework HuggingFace TRL + Transformers
Hardware Kaggle T4 GPU
Languages Malayalam, English, Manglish
Parameters ~1 Billion
Precision bfloat16
License Apache 2.0

๐Ÿ’ป Quick Start

Installation

pip install transformers accelerate torch

Basic Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_ID = "wincode/kerala-crime-detective-gemma"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    attn_implementation="eager"
)
model.eval()

def solve_crime(crime_report: str, mode: str = "comedy") -> str:
    system_prompts = {
        "comedy": (
            "You are Kerala Nadodikattu Detective, a comedy crime solver. "
            "Solve crimes using Malayalam, Manglish and English mix. "
            "Use humor, local references, Kerala culture."
        ),
        "serious": (
            "You are a senior Kerala Police CID officer. "
            "Analyze crime reports professionally with evidence analysis, "
            "suspect profiling, investigation steps and legal sections."
        ),
        "cyber": (
            "You are Kerala Cyber Cell's top investigator. "
            "Specialize in UPI fraud, SIM swap, sextortion, fake jobs. "
            "Provide immediate victim steps, recovery options, legal recourse."
        ),
    }

    messages = [
        {"role": "system", "content": system_prompts.get(mode, system_prompts["comedy"])},
        {"role": "user",   "content": crime_report},
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
    )

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        output_ids = model.generate(
            **inputs,
            max_new_tokens=400,
            do_sample=True,
            temperature=0.8,
            top_p=0.9,
            repetition_penalty=1.1,
            pad_token_id=tokenizer.pad_token_id,
        )

    new_tokens = output_ids[0][inputs["input_ids"].shape[1]:]
    return tokenizer.decode(new_tokens, skip_special_tokens=True)


# โ”€โ”€ Example 1: Malayalam Comedy Mode โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
report1 = """
Crime Report:
Location: Thrissur Pooram grounds
Time: 11 PM
Crime: 2kg gold ornaments missing from elephant caparison
Evidence: Footprints, torn dhoti piece
FIR: THR/2024/445
"""
print(solve_crime(report1, mode="comedy"))


# โ”€โ”€ Example 2: Cyber Crime Mode โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
report2 = """
Cyber Crime FIR:
Victim: Anitha, Homemaker, Palakkad
Crime: Received WhatsApp message saying I won Rs 50 lakh lottery.
They asked Rs 15,000 processing fee. I paid. Number now switched off.
Amount lost: Rs 15,000
"""
print(solve_crime(report2, mode="cyber"))


# โ”€โ”€ Example 3: Serious Investigation Mode โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
report3 = """
Hit and Run:
Location: NH-66 Kannur Bypass
Time: 11:30 PM
Victim: Biker, critical, ICU
Evidence: White paint transfer on victim bike
Witness: Trucker saw partial plate KL-13
"""
print(solve_crime(report3, mode="serious"))

๐Ÿ—‚๏ธ Training Dataset

Trained on the Kerala Crime Comedy Dataset โ€” a custom dataset covering:

Category Examples
๐ŸŽญ Malayalam Comedy Crime Gold theft, car theft, chain snatching, bicycle theft
๐Ÿ’ป Cyber Crimes Investment fraud, SIM swap, sextortion, OLX scam, fake jobs
๐Ÿ” Serious Crime Solving Murder investigation, drug bust, hit and run
๐Ÿ  Property Crime House breaking, land disputes
๐Ÿ’™ Support Cases Domestic violence, senior citizen fraud, missing persons
๐ŸŒŠ Environmental Crime Illegal sand mining
๐Ÿ’ฐ Financial Fraud Microfinance harassment, lottery fraud
๐Ÿ˜„ Comedy Light Cases Christmas star theft, goat eating crops, road rage

Dataset stats:

  • Total examples: 21+ (growing)
  • Languages: Malayalam, English, Manglish
  • Format: Chat (system + user + assistant)
  • Average response length: 300-500 tokens

๐Ÿ“Š Example Outputs

๐ŸŽญ Malayalam Comedy Mode

Input:

Crime: Gold chain snatching at Calicut beach. Victim: teacher. Suspect: young male on motorcycle.

Output:

โšก Chain Snatching Investigation Report!

Chechi! Chain poyi aano?
Bike-il vannu, chain eduthu, bike-il poyi!
Classic Calicut evening crime!

๐Ÿ” Evidence Analysis:
- Skid marks: Direction analysis possible
- Time: Evening = witnesses available
- Victim is teacher: Respected community member

Serious Steps:
โœ… Alert gold shops in 20km radius IMMEDIATELY
โœ… Deploy two-wheeler patrol on beach road
โœ… CCTV from SM Street main junction
โœ… Recovery chance: 60% if acted within 6 hours!

Chechi, we will get justice! ๐Ÿ’ช

๐ŸŒ Cyber Crime Expert Mode

Input:

UPI fraud: Someone sent QR code saying it will send me money. I scanned and entered PIN. Rs 35,000 deducted.

Output:

๐Ÿ’ธ UPI QR Scam - Classic Trap!

CALL 1930 RIGHT NOW (Cyber Crime Helpline - Free, 24/7)
Every minute counts for account freeze!

How the scam worked:
In UPI there are TWO types of QR codes:
๐Ÿ“ฅ COLLECT request = You are PAYING (needs PIN = money GOES OUT)
๐Ÿ“ค PAYMENT = You are RECEIVING (NO PIN needed ever)

GOLDEN RULE: You NEVER enter PIN to RECEIVE money!
If someone asks PIN to send you money = 100% SCAM always!

Recovery steps:
1. Call 1930 immediately
2. Report to your bank fraud line
3. Screenshot the UPI ID and report on cybercrime.gov.in

Recovery probability: 50-60% if reported within 4 hours!

โš ๏ธ Limitations

  • Model is fine-tuned on a small dataset (21 examples) โ€” responses may not always be perfectly formatted
  • Malayalam script quality depends on base model's multilingual capability
  • For real emergencies, always contact actual Kerala Police: 100 or Cyber Crime: 1930
  • Model provides educational and entertainment value โ€” not a substitute for real legal advice
  • Responses may vary due to sampling temperature

๐Ÿ›ก๏ธ Important Disclaimer

This model is for educational and entertainment purposes only.

For real crimes and emergencies:

  • Police Emergency: 100
  • Cyber Crime Helpline: 1930
  • Women's Helpline: 1091
  • Child Helpline: 1098
  • Cybercrime Portal: cybercrime.gov.in

๐Ÿ‹๏ธ Training Details

# Fine-tuning configuration used
sft_config = SFTConfig(
    max_length=1024,
    num_train_epochs=5,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,   # effective batch = 16
    gradient_checkpointing=True,
    learning_rate=2e-5,
    lr_scheduler_type="cosine",
    warmup_ratio=0.1,
    weight_decay=0.01,
    bf16=True,
    optim="adamw_torch_fused",
)

Hardware used: Kaggle T4 GPU (15GB VRAM) Training time: ~25 minutes for 5 epochs


๐Ÿ—บ๏ธ Roadmap

  • Expand dataset to 500+ examples
  • Add more Malayalam script examples
  • Add Manglish-only mode
  • Support for audio input (voice crime reports)
  • Add more cyber crime patterns (2024-2025 new scams)
  • Quantized version (GGUF) for local deployment
  • API endpoint for police department integration

๐Ÿ“ Related Resources

Resource Link
๐Ÿค— Model wincode/kerala-crime-detective-gemma
๐Ÿ“Š Dataset wincode/kerala-crime-comedy-dataset
๐ŸŽฎ Live Demo Spaces: kerala-crime-detective
๐Ÿ—๏ธ Base Model google/gemma-3-1b-it

๐Ÿ™ Credits

  • Base Model: Google Gemma 3 โ€” Thank you Google DeepMind
  • Fine-tuning: HuggingFace TRL โ€” SFTTrainer
  • Training Platform: Kaggle โ€” Free T4 GPU
  • Demo Framework: Gradio
  • Inspiration: Kerala Police, Kerala comedy films, and every aunty who knows everything ๐Ÿ™

๐Ÿ“œ License

This model is released under the Apache 2.0 License.

The base model Gemma 3 is subject to Google's Gemma Terms of Use.


Made with โค๏ธ in Kerala ๐ŸŒด | Nammude Kerala, Nammude Detective!

Downloads last month
730
Safetensors
Model size
1.0B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for wincode/kerala-crime-detective-gemma

Finetuned
(510)
this model

Space using wincode/kerala-crime-detective-gemma 1