omnibrowser-planner-1p5b-q4f16_1-MLC
A custom fine-tuned and quantized planner model for the omnibrowser-agent β a local-first, privacy-focused browser AI operator that runs entirely in the browser via WebGPU.
This model is the MLC/WebLLM-ready quantized version (q4f16_1) of the fine-tuned planner, designed to run on-device with zero API calls and no cloud costs.
Model Details
| Property | Value |
|---|---|
| Base model | Qwen/Qwen2.5-1.5B-Instruct |
| Fine-tuning method | LoRA (r=16, alpha=32) |
| Training hardware | Google Colab TPU v2-8 |
| Training precision | bfloat16 |
| Quantization | q4f16_1 (4-bit weights, float16 activations) |
| Quantization format | MLC-LLM (WebLLM compatible) |
| Inference target | In-browser via WebGPU (WebLLM) |
| Max sequence length | 512 tokens |
| Parameters | ~1.5B |
What it does
This model acts as a browser action planner. Given a high-level user goal (e.g., "Search contact John Doe in CRM and open profile"), it generates structured browser actions like click, type, navigate, extract, scroll, and done.
It was fine-tuned on a custom dataset (omnibrowser_planner_train.jsonl) of goal β action sequence pairs, covering common web automation patterns.
Training Details
- Base model:
Qwen/Qwen2.5-1.5B-Instruct - LoRA targets:
q_proj,k_proj,v_proj,o_proj,up_proj,down_proj,gate_proj - Epochs: 3
- Batch size: 4 per device Γ 2 gradient accumulation steps
- Learning rate: 2e-5 with cosine schedule + 3% warmup
- Optimizer: Adafactor (TPU-optimized)
- Validation split: 10%
- Merge: LoRA weights merged into full model before quantization
How to Use
This model is designed to run inside the browser using WebLLM via the omnibrowser-agent library.
With omnibrowser-agent (recommended)
import { createBrowserAgent } from "@akshayram1/omnibrowser-agent";
// 1. Set up the WebLLM bridge on the window object
// (see docs/EMBEDDING.md in the omnibrowser-agent repo for full example)
window.__browserAgentWebLLM = {
generate: async (prompt) => {
// your WebLLM engine call here
return await engine.chat.completions.create({ ... });
}
};
// 2. Create and run the agent
const agent = createBrowserAgent({
goal: "Open CRM and find customer John Smith",
mode: "human-approved",
planner: { kind: "webllm" }
}, {
onStep: (result) => console.log(result.message),
onApprovalRequired: (action) => console.log("Needs approval:", action),
onDone: (result) => console.log("Done:", result.message),
onMaxStepsReached: (session) => console.log("Max steps hit", session.history)
});
await agent.start();
// Resume after human approval step:
await agent.resume();
// Stop at any time:
agent.stop();
With AbortSignal
const controller = new AbortController();
const agent = createBrowserAgent({
goal: "Navigate to the settings page",
signal: controller.signal,
planner: { kind: "webllm" }
});
agent.start();
// Cancel from outside:
controller.abort();
Loading the model in WebLLM
import * as webllm from "https://esm.run/@mlc-ai/web-llm";
const engine = await webllm.CreateMLCEngine(
"omnibrowser-planner-q4f16_1", // custom model ID
{
model_list: [
{
model: "https://huggingface.co/Akshayram1/omnibrowser-planner-1p5b-q4f16_1-MLC",
model_id: "omnibrowser-planner-q4f16_1",
model_lib: webllm.modelLibURLPrefix + webllm.modelVersion + "/Qwen2.5-1.5B-Instruct-q4f16_1-ctx4k_cs1k-webgpu.wasm",
}
]
}
);
Note: The
.wasmmodel library is shared with the baseQwen2.5-1.5B-Instruct-q4f16_1since the architecture is identical β only the weights differ.
Supported Actions
The model outputs structured actions in the following format:
| Action | Description |
|---|---|
click |
Click an element by CSS selector |
type |
Type text into an input or textarea |
navigate |
Navigate to a URL |
extract |
Extract text from an element |
scroll |
Scroll a container or the page |
focus |
Focus an element (e.g. dropdowns) |
wait |
Pause for N milliseconds |
done |
Signal task completion |
Example Interaction
Input goal:
Navigate to https://example.com
Expected model output (action JSON):
{
"action": "navigate",
"url": "https://example.com"
}
Limitations
- Optimized for short, goal-oriented browser automation tasks (max 512 tokens context)
- Works best with goals matching patterns seen in training data (CRM navigation, form filling, search)
webllmmode requires a WebGPU-capable browser (Chrome 113+, Edge 113+)- Local inference uses device GPU/CPU β performance depends on hardware
Related Resources
- π omnibrowser-agent repo: github.com/akshayram1/omnibrowser-agent
- π Live demo: omnibrowser-agent.vercel.app
- π¦ Merged (unquantized) model: Akshayram1/omnibrowser-planner-1p5b
- π Embedding guide:
docs/EMBEDDING.mdin the repo - βοΈ Training notebook:
notebook/custom_quantized_llm_colab_copy.ipynb
License
MIT Β© Akshay Chame
Citation
If you use this model or the omnibrowser-agent library, please link back to the GitHub repo.
- Downloads last month
- 46