File size: 4,391 Bytes
14e9a9f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | ---
license: apache-2.0
library_name: transformers
tags:
- robotics
- haptics
- embeddings
- multimodal
- encoder
pipeline_tag: feature-extraction
---
# Motoko Embedding 1B
Motoko Embedding 1B is a foundation embedding model for haptic signal representation in robotics.
It encodes raw force, torque, pressure, and vibration signals into rich fixed-dimension vector embeddings for retrieval, search, and cross-modal fusion.
## Model Summary
- Model type: Encoder-only Transformer
- Parameters: 1B
- Input: Force, torque, pressure, vibration sequences
- Output: Fixed-dimension embedding vectors
- License: Apache 2.0
## Intended Uses
- Semantic search over haptic datasets
- Cross-modal alignment with vision and language
- Haptic RAG pipelines for robotic agents
- Dataset indexing and similarity clustering
- Downstream fine-tuning with LoRA adapters
## Architecture
Motoko Embedding 1B uses a signal-aware preprocessing stack followed by an encoder-only Transformer.
Multichannel sensor streams are windowed, normalized, projected into token embeddings, and aggregated into a single fixed-size embedding representation.
Key design points:
- Temporal patching over multiaxis haptic sequences
- Rotary position embeddings for long-context signal modeling
- Mean pooling over the final hidden states for embedding extraction
- Optional projection head for cross-modal alignment
## Input Format
The model expects synchronized haptic sequences containing one or more of the following modalities:
- Force
- Torque
- Pressure
- Vibration
Default sensor assumptions are defined in [`configs/sensor_config.yaml`](./configs/sensor_config.yaml).
Signal normalization and windowing parameters are defined in [`preprocessor/preprocessor_config.json`](./preprocessor/preprocessor_config.json).
## Repository Layout
```text
.
βββ README.md
βββ config.json
βββ tokenizer_config.json
βββ tokenizer.json
βββ model/
β βββ model.safetensors
β βββ model.safetensors.index.json
βββ preprocessor/
β βββ preprocessor_config.json
β βββ feature_extractor.py
βββ configs/
β βββ training_config.yaml
β βββ sensor_config.yaml
βββ examples/
β βββ inference.py
β βββ embedding_search.py
β βββ cross_modal.py
βββ .gitattributes
```
## Key Files
| File | Purpose |
| --- | --- |
| `config.json` | Encoder architecture: layers, heads, hidden size, projection dimensions |
| `configs/sensor_config.yaml` | Sensor input specs: axes, sequence length, sampling rate |
| `preprocessor/preprocessor_config.json` | Signal normalization, windowing, padding behavior |
| `preprocessor/feature_extractor.py` | Converts raw haptic arrays into encoder-ready tensors |
| `examples/embedding_search.py` | Vector similarity search over haptic embeddings |
| `examples/cross_modal.py` | Aligns haptic embeddings with vision or language vectors |
## Usage
### Load the processor
```python
from preprocessor.feature_extractor import HapticFeatureExtractor
extractor = HapticFeatureExtractor.from_pretrained(".")
```
### Basic embedding inference
```python
import numpy as np
from preprocessor.feature_extractor import HapticFeatureExtractor
extractor = HapticFeatureExtractor.from_pretrained(".")
sample = np.random.randn(1024, 12).astype("float32")
features = extractor(sample)
print(features["input_values"].shape)
print(features["attention_mask"].shape)
```
See [`examples/inference.py`](./examples/inference.py) for a complete example.
## Training
Baseline training parameters are provided in [`configs/training_config.yaml`](./configs/training_config.yaml).
These values are intended as a starting point for pretraining or continued domain adaptation, not as a claim of the exact recipe used for a released checkpoint.
## Limitations
- Performance depends heavily on sensor calibration and synchronization quality.
- Out-of-distribution hardware setups may require updated preprocessing statistics.
- Cross-modal alignment quality depends on the paired supervision used during training.
- This repository scaffold does not include production weights.
## Citation
```bibtex
@misc{motoko_embedding_1b,
title = {Motoko Embedding 1B},
author = {Motoko},
year = {2026},
howpublished = {\url{https://huggingface.co/}}
}
```
|