File size: 4,391 Bytes
14e9a9f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
---
license: apache-2.0
library_name: transformers
tags:
  - robotics
  - haptics
  - embeddings
  - multimodal
  - encoder
pipeline_tag: feature-extraction
---

# Motoko Embedding 1B

Motoko Embedding 1B is a foundation embedding model for haptic signal representation in robotics.
It encodes raw force, torque, pressure, and vibration signals into rich fixed-dimension vector embeddings for retrieval, search, and cross-modal fusion.

## Model Summary

- Model type: Encoder-only Transformer
- Parameters: 1B
- Input: Force, torque, pressure, vibration sequences
- Output: Fixed-dimension embedding vectors
- License: Apache 2.0

## Intended Uses

- Semantic search over haptic datasets
- Cross-modal alignment with vision and language
- Haptic RAG pipelines for robotic agents
- Dataset indexing and similarity clustering
- Downstream fine-tuning with LoRA adapters

## Architecture

Motoko Embedding 1B uses a signal-aware preprocessing stack followed by an encoder-only Transformer.
Multichannel sensor streams are windowed, normalized, projected into token embeddings, and aggregated into a single fixed-size embedding representation.

Key design points:

- Temporal patching over multiaxis haptic sequences
- Rotary position embeddings for long-context signal modeling
- Mean pooling over the final hidden states for embedding extraction
- Optional projection head for cross-modal alignment

## Input Format

The model expects synchronized haptic sequences containing one or more of the following modalities:

- Force
- Torque
- Pressure
- Vibration

Default sensor assumptions are defined in [`configs/sensor_config.yaml`](./configs/sensor_config.yaml).
Signal normalization and windowing parameters are defined in [`preprocessor/preprocessor_config.json`](./preprocessor/preprocessor_config.json).

## Repository Layout

```text
.
β”œβ”€β”€ README.md
β”œβ”€β”€ config.json
β”œβ”€β”€ tokenizer_config.json
β”œβ”€β”€ tokenizer.json
β”œβ”€β”€ model/
β”‚   β”œβ”€β”€ model.safetensors
β”‚   └── model.safetensors.index.json
β”œβ”€β”€ preprocessor/
β”‚   β”œβ”€β”€ preprocessor_config.json
β”‚   └── feature_extractor.py
β”œβ”€β”€ configs/
β”‚   β”œβ”€β”€ training_config.yaml
β”‚   └── sensor_config.yaml
β”œβ”€β”€ examples/
β”‚   β”œβ”€β”€ inference.py
β”‚   β”œβ”€β”€ embedding_search.py
β”‚   └── cross_modal.py
└── .gitattributes
```

## Key Files

| File | Purpose |
| --- | --- |
| `config.json` | Encoder architecture: layers, heads, hidden size, projection dimensions |
| `configs/sensor_config.yaml` | Sensor input specs: axes, sequence length, sampling rate |
| `preprocessor/preprocessor_config.json` | Signal normalization, windowing, padding behavior |
| `preprocessor/feature_extractor.py` | Converts raw haptic arrays into encoder-ready tensors |
| `examples/embedding_search.py` | Vector similarity search over haptic embeddings |
| `examples/cross_modal.py` | Aligns haptic embeddings with vision or language vectors |

## Usage

### Load the processor

```python
from preprocessor.feature_extractor import HapticFeatureExtractor

extractor = HapticFeatureExtractor.from_pretrained(".")
```

### Basic embedding inference

```python
import numpy as np

from preprocessor.feature_extractor import HapticFeatureExtractor

extractor = HapticFeatureExtractor.from_pretrained(".")
sample = np.random.randn(1024, 12).astype("float32")
features = extractor(sample)

print(features["input_values"].shape)
print(features["attention_mask"].shape)
```

See [`examples/inference.py`](./examples/inference.py) for a complete example.

## Training

Baseline training parameters are provided in [`configs/training_config.yaml`](./configs/training_config.yaml).
These values are intended as a starting point for pretraining or continued domain adaptation, not as a claim of the exact recipe used for a released checkpoint.

## Limitations

- Performance depends heavily on sensor calibration and synchronization quality.
- Out-of-distribution hardware setups may require updated preprocessing statistics.
- Cross-modal alignment quality depends on the paired supervision used during training.
- This repository scaffold does not include production weights.

## Citation

```bibtex
@misc{motoko_embedding_1b,
  title        = {Motoko Embedding 1B},
  author       = {Motoko},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/}}
}
```