omnibrowser-planner-1p5b-q4f16_1-MLC

A custom fine-tuned and quantized planner model for the omnibrowser-agent — a local-first, privacy-focused browser AI operator that runs entirely in the browser via WebGPU.

This model is the MLC/WebLLM-ready quantized version (q4f16_1) of the fine-tuned planner, designed to run on-device with zero API calls and no cloud costs.

Model Details

Property	Value
Base model	`Qwen/Qwen2.5-1.5B-Instruct`
Fine-tuning method	LoRA (r=16, alpha=32)
Training hardware	Google Colab TPU v2-8
Training precision	bfloat16
Quantization	q4f16_1 (4-bit weights, float16 activations)
Quantization format	MLC-LLM (WebLLM compatible)
Inference target	In-browser via WebGPU (WebLLM)
Max sequence length	512 tokens
Parameters	~1.5B

What it does

This model acts as a browser action planner. Given a high-level user goal (e.g., "Search contact John Doe in CRM and open profile"), it generates structured browser actions like click, type, navigate, extract, scroll, and done.

It was fine-tuned on a custom dataset (omnibrowser_planner_train.jsonl) of goal → action sequence pairs, covering common web automation patterns.

Training Details

Base model: Qwen/Qwen2.5-1.5B-Instruct
LoRA targets: q_proj, k_proj, v_proj, o_proj, up_proj, down_proj, gate_proj
Epochs: 3
Batch size: 4 per device × 2 gradient accumulation steps
Learning rate: 2e-5 with cosine schedule + 3% warmup
Optimizer: Adafactor (TPU-optimized)
Validation split: 10%
Merge: LoRA weights merged into full model before quantization

How to Use

This model is designed to run inside the browser using WebLLM via the omnibrowser-agent library.

With omnibrowser-agent (recommended)

import { createBrowserAgent } from "@akshayram1/omnibrowser-agent";

// 1. Set up the WebLLM bridge on the window object
//    (see docs/EMBEDDING.md in the omnibrowser-agent repo for full example)
window.__browserAgentWebLLM = {
  generate: async (prompt) => {
    // your WebLLM engine call here
    return await engine.chat.completions.create({ ... });
  }
};

// 2. Create and run the agent
const agent = createBrowserAgent({
  goal: "Open CRM and find customer John Smith",
  mode: "human-approved",
  planner: { kind: "webllm" }
}, {
  onStep: (result) => console.log(result.message),
  onApprovalRequired: (action) => console.log("Needs approval:", action),
  onDone: (result) => console.log("Done:", result.message),
  onMaxStepsReached: (session) => console.log("Max steps hit", session.history)
});

await agent.start();

// Resume after human approval step:
await agent.resume();

// Stop at any time:
agent.stop();

With AbortSignal

const controller = new AbortController();

const agent = createBrowserAgent({
  goal: "Navigate to the settings page",
  signal: controller.signal,
  planner: { kind: "webllm" }
});

agent.start();

// Cancel from outside:
controller.abort();

Loading the model in WebLLM

import * as webllm from "https://esm.run/@mlc-ai/web-llm";

const engine = await webllm.CreateMLCEngine(
  "omnibrowser-planner-q4f16_1",  // custom model ID
  {
    model_list: [
      {
        model: "https://huggingface.co/Akshayram1/omnibrowser-planner-1p5b-q4f16_1-MLC",
        model_id: "omnibrowser-planner-q4f16_1",
        model_lib: webllm.modelLibURLPrefix + webllm.modelVersion + "/Qwen2.5-1.5B-Instruct-q4f16_1-ctx4k_cs1k-webgpu.wasm",
      }
    ]
  }
);

Note: The .wasm model library is shared with the base Qwen2.5-1.5B-Instruct-q4f16_1 since the architecture is identical — only the weights differ.

Supported Actions

The model outputs structured actions in the following format:

Action	Description
`click`	Click an element by CSS selector
`type`	Type text into an input or textarea
`navigate`	Navigate to a URL
`extract`	Extract text from an element
`scroll`	Scroll a container or the page
`focus`	Focus an element (e.g. dropdowns)
`wait`	Pause for N milliseconds
`done`	Signal task completion

Example Interaction

Input goal:

Navigate to https://example.com

Expected model output (action JSON):

{
  "action": "navigate",
  "url": "https://example.com"
}

Limitations

Optimized for short, goal-oriented browser automation tasks (max 512 tokens context)
Works best with goals matching patterns seen in training data (CRM navigation, form filling, search)
webllm mode requires a WebGPU-capable browser (Chrome 113+, Edge 113+)
Local inference uses device GPU/CPU — performance depends on hardware

Related Resources

🔗 omnibrowser-agent repo: github.com/akshayram1/omnibrowser-agent
🌐 Live demo: omnibrowser-agent.vercel.app
📦 Merged (unquantized) model: Akshayram1/omnibrowser-planner-1p5b
📖 Embedding guide: docs/EMBEDDING.md in the repo
⚙️ Training notebook: notebook/custom_quantized_llm_colab_copy.ipynb

License

Citation

If you use this model or the omnibrowser-agent library, please link back to the GitHub repo.

Downloads last month: 46

Model tree for Akshayram1/omnibrowser-planner-1p5b-q4f16_1-MLC

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Adapter

(824)

this model