omnibrowser-planner-1p5b-q4f16_1-MLC

A custom fine-tuned and quantized planner model for the omnibrowser-agent β€” a local-first, privacy-focused browser AI operator that runs entirely in the browser via WebGPU.

This model is the MLC/WebLLM-ready quantized version (q4f16_1) of the fine-tuned planner, designed to run on-device with zero API calls and no cloud costs.


Model Details

Property Value
Base model Qwen/Qwen2.5-1.5B-Instruct
Fine-tuning method LoRA (r=16, alpha=32)
Training hardware Google Colab TPU v2-8
Training precision bfloat16
Quantization q4f16_1 (4-bit weights, float16 activations)
Quantization format MLC-LLM (WebLLM compatible)
Inference target In-browser via WebGPU (WebLLM)
Max sequence length 512 tokens
Parameters ~1.5B

What it does

This model acts as a browser action planner. Given a high-level user goal (e.g., "Search contact John Doe in CRM and open profile"), it generates structured browser actions like click, type, navigate, extract, scroll, and done.

It was fine-tuned on a custom dataset (omnibrowser_planner_train.jsonl) of goal β†’ action sequence pairs, covering common web automation patterns.


Training Details

  • Base model: Qwen/Qwen2.5-1.5B-Instruct
  • LoRA targets: q_proj, k_proj, v_proj, o_proj, up_proj, down_proj, gate_proj
  • Epochs: 3
  • Batch size: 4 per device Γ— 2 gradient accumulation steps
  • Learning rate: 2e-5 with cosine schedule + 3% warmup
  • Optimizer: Adafactor (TPU-optimized)
  • Validation split: 10%
  • Merge: LoRA weights merged into full model before quantization

How to Use

This model is designed to run inside the browser using WebLLM via the omnibrowser-agent library.

With omnibrowser-agent (recommended)

import { createBrowserAgent } from "@akshayram1/omnibrowser-agent";

// 1. Set up the WebLLM bridge on the window object
//    (see docs/EMBEDDING.md in the omnibrowser-agent repo for full example)
window.__browserAgentWebLLM = {
  generate: async (prompt) => {
    // your WebLLM engine call here
    return await engine.chat.completions.create({ ... });
  }
};

// 2. Create and run the agent
const agent = createBrowserAgent({
  goal: "Open CRM and find customer John Smith",
  mode: "human-approved",
  planner: { kind: "webllm" }
}, {
  onStep: (result) => console.log(result.message),
  onApprovalRequired: (action) => console.log("Needs approval:", action),
  onDone: (result) => console.log("Done:", result.message),
  onMaxStepsReached: (session) => console.log("Max steps hit", session.history)
});

await agent.start();

// Resume after human approval step:
await agent.resume();

// Stop at any time:
agent.stop();

With AbortSignal

const controller = new AbortController();

const agent = createBrowserAgent({
  goal: "Navigate to the settings page",
  signal: controller.signal,
  planner: { kind: "webllm" }
});

agent.start();

// Cancel from outside:
controller.abort();

Loading the model in WebLLM

import * as webllm from "https://esm.run/@mlc-ai/web-llm";

const engine = await webllm.CreateMLCEngine(
  "omnibrowser-planner-q4f16_1",  // custom model ID
  {
    model_list: [
      {
        model: "https://huggingface.co/Akshayram1/omnibrowser-planner-1p5b-q4f16_1-MLC",
        model_id: "omnibrowser-planner-q4f16_1",
        model_lib: webllm.modelLibURLPrefix + webllm.modelVersion + "/Qwen2.5-1.5B-Instruct-q4f16_1-ctx4k_cs1k-webgpu.wasm",
      }
    ]
  }
);

Note: The .wasm model library is shared with the base Qwen2.5-1.5B-Instruct-q4f16_1 since the architecture is identical β€” only the weights differ.


Supported Actions

The model outputs structured actions in the following format:

Action Description
click Click an element by CSS selector
type Type text into an input or textarea
navigate Navigate to a URL
extract Extract text from an element
scroll Scroll a container or the page
focus Focus an element (e.g. dropdowns)
wait Pause for N milliseconds
done Signal task completion

Example Interaction

Input goal:

Navigate to https://example.com

Expected model output (action JSON):

{
  "action": "navigate",
  "url": "https://example.com"
}

Limitations

  • Optimized for short, goal-oriented browser automation tasks (max 512 tokens context)
  • Works best with goals matching patterns seen in training data (CRM navigation, form filling, search)
  • webllm mode requires a WebGPU-capable browser (Chrome 113+, Edge 113+)
  • Local inference uses device GPU/CPU β€” performance depends on hardware

Related Resources


License

MIT Β© Akshay Chame


Citation

If you use this model or the omnibrowser-agent library, please link back to the GitHub repo.

Downloads last month
46
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Akshayram1/omnibrowser-planner-1p5b-q4f16_1-MLC

Adapter
(824)
this model