6Net — 6-Axis Visual Robot Policy (~115M)

Custom transformer policy for visual 6-DoF robot arm control. Trained from scratch (no LoRA).

Component	Detail	Params
Visual Encoder	ResNet-18 fine-tuned	~11.7M
Visual Projection	Linear(512→768)	~0.4M
State Encoder	MLP(6→256→768)	~0.2M
Transformer	14L · d=768 · 12h · ffn=3072	~99.1M
Action Head	MLP(768→256→6)	~0.2M
Total		~111M

Dataset: synthetic · Steps: 455 · Eff. batch: 32

Inference

import torch
from train_6net_local import SixNet, Config
import torchvision.transforms as T
from PIL import Image

model = SixNet(Config())
ckpt  = torch.load("6net_final.pt", map_location="cpu")
model.load_state_dict(ckpt["model_state"])
model.eval()

tf  = T.Compose([T.Resize((224,224)), T.ToTensor(),
                 T.Normalize([.485,.456,.406],[.229,.224,.225])])
img = tf(Image.open("cam.jpg")).unsqueeze(0)   # (1,3,224,224)
jts = torch.zeros(1, 6)                         # current joint angles (rad)
with torch.no_grad():
    action = model(img, jts)                     # (1,6) predicted targets

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics