hrudu commited on
Commit
14e9a9f
·
1 Parent(s): 92e412a

Add Hugging Face model scaffold

Browse files
README.md CHANGED
@@ -1,3 +1,141 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ tags:
5
+ - robotics
6
+ - haptics
7
+ - embeddings
8
+ - multimodal
9
+ - encoder
10
+ pipeline_tag: feature-extraction
11
+ ---
12
+
13
+ # Motoko Embedding 1B
14
+
15
+ Motoko Embedding 1B is a foundation embedding model for haptic signal representation in robotics.
16
+ It encodes raw force, torque, pressure, and vibration signals into rich fixed-dimension vector embeddings for retrieval, search, and cross-modal fusion.
17
+
18
+ ## Model Summary
19
+
20
+ - Model type: Encoder-only Transformer
21
+ - Parameters: 1B
22
+ - Input: Force, torque, pressure, vibration sequences
23
+ - Output: Fixed-dimension embedding vectors
24
+ - License: Apache 2.0
25
+
26
+ ## Intended Uses
27
+
28
+ - Semantic search over haptic datasets
29
+ - Cross-modal alignment with vision and language
30
+ - Haptic RAG pipelines for robotic agents
31
+ - Dataset indexing and similarity clustering
32
+ - Downstream fine-tuning with LoRA adapters
33
+
34
+ ## Architecture
35
+
36
+ Motoko Embedding 1B uses a signal-aware preprocessing stack followed by an encoder-only Transformer.
37
+ Multichannel sensor streams are windowed, normalized, projected into token embeddings, and aggregated into a single fixed-size embedding representation.
38
+
39
+ Key design points:
40
+
41
+ - Temporal patching over multiaxis haptic sequences
42
+ - Rotary position embeddings for long-context signal modeling
43
+ - Mean pooling over the final hidden states for embedding extraction
44
+ - Optional projection head for cross-modal alignment
45
+
46
+ ## Input Format
47
+
48
+ The model expects synchronized haptic sequences containing one or more of the following modalities:
49
+
50
+ - Force
51
+ - Torque
52
+ - Pressure
53
+ - Vibration
54
+
55
+ Default sensor assumptions are defined in [`configs/sensor_config.yaml`](./configs/sensor_config.yaml).
56
+ Signal normalization and windowing parameters are defined in [`preprocessor/preprocessor_config.json`](./preprocessor/preprocessor_config.json).
57
+
58
+ ## Repository Layout
59
+
60
+ ```text
61
+ .
62
+ ├── README.md
63
+ ├── config.json
64
+ ├── tokenizer_config.json
65
+ ├── tokenizer.json
66
+ ├── model/
67
+ │ ├── model.safetensors
68
+ │ └── model.safetensors.index.json
69
+ ├── preprocessor/
70
+ │ ├── preprocessor_config.json
71
+ │ └── feature_extractor.py
72
+ ├── configs/
73
+ │ ├── training_config.yaml
74
+ │ └── sensor_config.yaml
75
+ ├── examples/
76
+ │ ├── inference.py
77
+ │ ├── embedding_search.py
78
+ │ └── cross_modal.py
79
+ └── .gitattributes
80
+ ```
81
+
82
+ ## Key Files
83
+
84
+ | File | Purpose |
85
+ | --- | --- |
86
+ | `config.json` | Encoder architecture: layers, heads, hidden size, projection dimensions |
87
+ | `configs/sensor_config.yaml` | Sensor input specs: axes, sequence length, sampling rate |
88
+ | `preprocessor/preprocessor_config.json` | Signal normalization, windowing, padding behavior |
89
+ | `preprocessor/feature_extractor.py` | Converts raw haptic arrays into encoder-ready tensors |
90
+ | `examples/embedding_search.py` | Vector similarity search over haptic embeddings |
91
+ | `examples/cross_modal.py` | Aligns haptic embeddings with vision or language vectors |
92
+
93
+ ## Usage
94
+
95
+ ### Load the processor
96
+
97
+ ```python
98
+ from preprocessor.feature_extractor import HapticFeatureExtractor
99
+
100
+ extractor = HapticFeatureExtractor.from_pretrained(".")
101
+ ```
102
+
103
+ ### Basic embedding inference
104
+
105
+ ```python
106
+ import numpy as np
107
+
108
+ from preprocessor.feature_extractor import HapticFeatureExtractor
109
+
110
+ extractor = HapticFeatureExtractor.from_pretrained(".")
111
+ sample = np.random.randn(1024, 12).astype("float32")
112
+ features = extractor(sample)
113
+
114
+ print(features["input_values"].shape)
115
+ print(features["attention_mask"].shape)
116
+ ```
117
+
118
+ See [`examples/inference.py`](./examples/inference.py) for a complete example.
119
+
120
+ ## Training
121
+
122
+ Baseline training parameters are provided in [`configs/training_config.yaml`](./configs/training_config.yaml).
123
+ These values are intended as a starting point for pretraining or continued domain adaptation, not as a claim of the exact recipe used for a released checkpoint.
124
+
125
+ ## Limitations
126
+
127
+ - Performance depends heavily on sensor calibration and synchronization quality.
128
+ - Out-of-distribution hardware setups may require updated preprocessing statistics.
129
+ - Cross-modal alignment quality depends on the paired supervision used during training.
130
+ - This repository scaffold does not include production weights.
131
+
132
+ ## Citation
133
+
134
+ ```bibtex
135
+ @misc{motoko_embedding_1b,
136
+ title = {Motoko Embedding 1B},
137
+ author = {Motoko},
138
+ year = {2026},
139
+ howpublished = {\url{https://huggingface.co/}}
140
+ }
141
+ ```
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MotokoEmbeddingModel"
4
+ ],
5
+ "model_type": "motoko-haptic-encoder",
6
+ "hidden_size": 2048,
7
+ "intermediate_size": 8192,
8
+ "num_hidden_layers": 24,
9
+ "num_attention_heads": 16,
10
+ "num_key_value_heads": 16,
11
+ "max_position_embeddings": 4096,
12
+ "hidden_act": "silu",
13
+ "hidden_dropout_prob": 0.0,
14
+ "attention_probs_dropout_prob": 0.0,
15
+ "layer_norm_eps": 1e-05,
16
+ "initializer_range": 0.02,
17
+ "rope_theta": 10000.0,
18
+ "embedding_dim": 1024,
19
+ "projection_dim": 1024,
20
+ "pooling_type": "mean",
21
+ "patch_size": 16,
22
+ "num_input_channels": 12,
23
+ "torch_dtype": "float16",
24
+ "transformers_version": "4.46.0"
25
+ }
configs/sensor_config.yaml ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ sensors:
2
+ force:
3
+ axes: [x, y, z]
4
+ units: newton
5
+ torque:
6
+ axes: [x, y, z]
7
+ units: newton_meter
8
+ pressure:
9
+ channels: 2
10
+ units: pascal
11
+ vibration:
12
+ channels: 2
13
+ units: arbitrary
14
+
15
+ input:
16
+ total_channels: 12
17
+ sequence_length: 1024
18
+ sampling_rate_hz: 1000
19
+ patch_size: 16
20
+ padding_side: right
21
+
22
+ alignment:
23
+ timestamp_sync: required
24
+ missing_value_policy: zero_pad
configs/training_config.yaml ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model:
2
+ name: motoko-embedding-1-1b
3
+ embedding_dim: 1024
4
+ pooling_type: mean
5
+
6
+ training:
7
+ seed: 42
8
+ epochs: 20
9
+ max_steps: 500000
10
+ per_device_train_batch_size: 16
11
+ per_device_eval_batch_size: 16
12
+ gradient_accumulation_steps: 8
13
+ learning_rate: 2.0e-4
14
+ min_learning_rate: 2.0e-5
15
+ weight_decay: 0.01
16
+ warmup_ratio: 0.03
17
+ max_grad_norm: 1.0
18
+ precision: bf16
19
+ gradient_checkpointing: true
20
+
21
+ data:
22
+ train_manifest: data/train.jsonl
23
+ eval_manifest: data/eval.jsonl
24
+ sequence_length: 1024
25
+ num_channels: 12
26
+ sampling_rate_hz: 1000
27
+
28
+ objectives:
29
+ contrastive_loss_weight: 1.0
30
+ reconstruction_loss_weight: 0.0
31
+ cross_modal_alignment_weight: 0.5
32
+
33
+ optimizer:
34
+ type: adamw
35
+ betas: [0.9, 0.95]
36
+ eps: 1.0e-8
37
+
38
+ logging:
39
+ report_to:
40
+ - tensorboard
41
+ logging_steps: 20
42
+ eval_steps: 1000
43
+ save_steps: 1000
44
+ save_total_limit: 5
examples/cross_modal.py ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+
3
+
4
+ def l2_normalize(x):
5
+ return x / np.linalg.norm(x, axis=-1, keepdims=True)
6
+
7
+
8
+ def main():
9
+ embedding_dim = 1024
10
+ haptic_embedding = l2_normalize(np.random.randn(1, embedding_dim).astype(np.float32))
11
+ vision_embedding = l2_normalize(np.random.randn(1, embedding_dim).astype(np.float32))
12
+ similarity = float((haptic_embedding * vision_embedding).sum())
13
+
14
+ print("Cross-modal similarity:", similarity)
15
+
16
+
17
+ if __name__ == "__main__":
18
+ main()
examples/embedding_search.py ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+
3
+
4
+ def cosine_similarity(query, matrix):
5
+ query_norm = query / np.linalg.norm(query)
6
+ matrix_norm = matrix / np.linalg.norm(matrix, axis=1, keepdims=True)
7
+ return matrix_norm @ query_norm
8
+
9
+
10
+ def main():
11
+ embedding_dim = 1024
12
+ query = np.random.randn(embedding_dim).astype(np.float32)
13
+ index = np.random.randn(5, embedding_dim).astype(np.float32)
14
+ scores = cosine_similarity(query, index)
15
+ ranked = np.argsort(scores)[::-1]
16
+
17
+ print("Nearest neighbors:", ranked.tolist())
18
+ print("Scores:", scores[ranked].tolist())
19
+
20
+
21
+ if __name__ == "__main__":
22
+ main()
examples/inference.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+
3
+ from preprocessor.feature_extractor import HapticFeatureExtractor
4
+
5
+
6
+ def main():
7
+ extractor = HapticFeatureExtractor.from_pretrained(".")
8
+ sample = np.random.randn(1024, 12).astype(np.float32)
9
+ features = extractor(sample)
10
+
11
+ print("input_values:", features["input_values"].shape)
12
+ print("attention_mask:", features["attention_mask"].shape)
13
+ print("This scaffold does not include a runnable model checkpoint.")
14
+
15
+
16
+ if __name__ == "__main__":
17
+ main()
model/model.safetensors ADDED
File without changes
model/model.safetensors.index.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 0
4
+ },
5
+ "weight_map": {}
6
+ }
preprocessor/feature_extractor.py ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ from pathlib import Path
3
+
4
+ import numpy as np
5
+
6
+
7
+ class HapticFeatureExtractor:
8
+ def __init__(self, config):
9
+ self.config = config
10
+ self.window_size = int(config["window_size"])
11
+ self.padding_value = float(config.get("padding_value", 0.0))
12
+ self.return_attention_mask = bool(config.get("return_attention_mask", True))
13
+ normalization = config.get("normalization", {})
14
+ self.mean = np.asarray(normalization.get("mean", []), dtype=np.float32)
15
+ self.std = np.asarray(normalization.get("std", []), dtype=np.float32)
16
+
17
+ @classmethod
18
+ def from_pretrained(cls, root):
19
+ root_path = Path(root)
20
+ config_path = root_path / "preprocessor" / "preprocessor_config.json"
21
+ with config_path.open("r", encoding="utf-8") as handle:
22
+ config = json.load(handle)
23
+ return cls(config)
24
+
25
+ def _normalize(self, values):
26
+ if not self.config.get("normalize", True):
27
+ return values
28
+ if self.mean.size == 0 or self.std.size == 0:
29
+ return values
30
+ denom = np.where(self.std == 0, 1.0, self.std)
31
+ return (values - self.mean) / denom
32
+
33
+ def __call__(self, values):
34
+ values = np.asarray(values, dtype=np.float32)
35
+ if values.ndim != 2:
36
+ raise ValueError("Expected input shape [sequence_length, num_channels].")
37
+
38
+ values = self._normalize(values)
39
+ length, channels = values.shape
40
+
41
+ if length >= self.window_size:
42
+ trimmed = values[: self.window_size]
43
+ attention_mask = np.ones(self.window_size, dtype=np.int64)
44
+ else:
45
+ pad_amount = self.window_size - length
46
+ padding = np.full((pad_amount, channels), self.padding_value, dtype=np.float32)
47
+ trimmed = np.concatenate([values, padding], axis=0)
48
+ attention_mask = np.concatenate(
49
+ [
50
+ np.ones(length, dtype=np.int64),
51
+ np.zeros(pad_amount, dtype=np.int64),
52
+ ]
53
+ )
54
+
55
+ result = {"input_values": trimmed}
56
+ if self.return_attention_mask:
57
+ result["attention_mask"] = attention_mask
58
+ return result
preprocessor/preprocessor_config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "feature_extractor_type": "HapticFeatureExtractor",
3
+ "sampling_rate_hz": 1000,
4
+ "window_size": 1024,
5
+ "hop_length": 256,
6
+ "padding_value": 0.0,
7
+ "padding_side": "right",
8
+ "return_attention_mask": true,
9
+ "normalize": true,
10
+ "normalization": {
11
+ "mean": [
12
+ 0.0,
13
+ 0.0,
14
+ 0.0,
15
+ 0.0,
16
+ 0.0,
17
+ 0.0,
18
+ 0.0,
19
+ 0.0,
20
+ 0.0,
21
+ 0.0,
22
+ 0.0,
23
+ 0.0
24
+ ],
25
+ "std": [
26
+ 1.0,
27
+ 1.0,
28
+ 1.0,
29
+ 1.0,
30
+ 1.0,
31
+ 1.0,
32
+ 1.0,
33
+ 1.0,
34
+ 1.0,
35
+ 1.0,
36
+ 1.0,
37
+ 1.0
38
+ ]
39
+ }
40
+ }
tokenizer.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "truncation": null,
4
+ "padding": null,
5
+ "added_tokens": [
6
+ {
7
+ "id": 0,
8
+ "content": "<pad>",
9
+ "special": true
10
+ },
11
+ {
12
+ "id": 1,
13
+ "content": "<unk>",
14
+ "special": true
15
+ }
16
+ ],
17
+ "normalizer": {
18
+ "type": "Sequence"
19
+ },
20
+ "pre_tokenizer": {
21
+ "type": "WhitespaceSplit"
22
+ },
23
+ "model": {
24
+ "type": "WordLevel",
25
+ "vocab": {
26
+ "<pad>": 0,
27
+ "<unk>": 1,
28
+ "force_x": 2,
29
+ "force_y": 3,
30
+ "force_z": 4,
31
+ "torque_x": 5,
32
+ "torque_y": 6,
33
+ "torque_z": 7,
34
+ "pressure_0": 8,
35
+ "pressure_1": 9,
36
+ "vibration_0": 10,
37
+ "vibration_1": 11,
38
+ "contact_on": 12,
39
+ "contact_off": 13
40
+ },
41
+ "unk_token": "<unk>"
42
+ }
43
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "tokenizer_class": "SignalTokenizer",
3
+ "model_input_names": [
4
+ "input_values",
5
+ "attention_mask"
6
+ ],
7
+ "padding_side": "right",
8
+ "truncation_side": "right",
9
+ "pad_token": "<pad>",
10
+ "unk_token": "<unk>",
11
+ "max_length": 4096,
12
+ "do_normalize": true
13
+ }