Add Hugging Face model scaffold

Browse files

Files changed (13) hide show

README.md +141 -3
config.json +25 -0
configs/sensor_config.yaml +24 -0
configs/training_config.yaml +44 -0
examples/cross_modal.py +18 -0
examples/embedding_search.py +22 -0
examples/inference.py +17 -0
model/model.safetensors +0 -0
model/model.safetensors.index.json +6 -0
preprocessor/feature_extractor.py +58 -0
preprocessor/preprocessor_config.json +40 -0
tokenizer.json +43 -0
tokenizer_config.json +13 -0

README.md CHANGED Viewed

@@ -1,3 +1,141 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+library_name: transformers
+tags:
+  - robotics
+  - haptics
+  - embeddings
+  - multimodal
+  - encoder
+pipeline_tag: feature-extraction
+---
+# Motoko Embedding 1B
+Motoko Embedding 1B is a foundation embedding model for haptic signal representation in robotics.
+It encodes raw force, torque, pressure, and vibration signals into rich fixed-dimension vector embeddings for retrieval, search, and cross-modal fusion.
+## Model Summary
+- Model type: Encoder-only Transformer
+- Parameters: 1B
+- Input: Force, torque, pressure, vibration sequences
+- Output: Fixed-dimension embedding vectors
+- License: Apache 2.0
+## Intended Uses
+- Semantic search over haptic datasets
+- Cross-modal alignment with vision and language
+- Haptic RAG pipelines for robotic agents
+- Dataset indexing and similarity clustering
+- Downstream fine-tuning with LoRA adapters
+## Architecture
+Motoko Embedding 1B uses a signal-aware preprocessing stack followed by an encoder-only Transformer.
+Multichannel sensor streams are windowed, normalized, projected into token embeddings, and aggregated into a single fixed-size embedding representation.
+Key design points:
+- Temporal patching over multiaxis haptic sequences
+- Rotary position embeddings for long-context signal modeling
+- Mean pooling over the final hidden states for embedding extraction
+- Optional projection head for cross-modal alignment
+## Input Format
+The model expects synchronized haptic sequences containing one or more of the following modalities:
+- Force
+- Torque
+- Pressure
+- Vibration
+Default sensor assumptions are defined in [`configs/sensor_config.yaml`](./configs/sensor_config.yaml).
+Signal normalization and windowing parameters are defined in [`preprocessor/preprocessor_config.json`](./preprocessor/preprocessor_config.json).
+## Repository Layout
+```text
+.
+├── README.md
+├── config.json
+├── tokenizer_config.json
+├── tokenizer.json
+├── model/
+│   ├── model.safetensors
+│   └── model.safetensors.index.json
+├── preprocessor/
+│   ├── preprocessor_config.json
+│   └── feature_extractor.py
+├── configs/
+│   ├── training_config.yaml
+│   └── sensor_config.yaml
+├── examples/
+│   ├── inference.py
+│   ├── embedding_search.py
+│   └── cross_modal.py
+└── .gitattributes
+```
+## Key Files
+| File | Purpose |
+| --- | --- |
+| `config.json` | Encoder architecture: layers, heads, hidden size, projection dimensions |
+| `configs/sensor_config.yaml` | Sensor input specs: axes, sequence length, sampling rate |
+| `preprocessor/preprocessor_config.json` | Signal normalization, windowing, padding behavior |
+| `preprocessor/feature_extractor.py` | Converts raw haptic arrays into encoder-ready tensors |
+| `examples/embedding_search.py` | Vector similarity search over haptic embeddings |
+| `examples/cross_modal.py` | Aligns haptic embeddings with vision or language vectors |
+## Usage
+### Load the processor
+```python
+from preprocessor.feature_extractor import HapticFeatureExtractor
+extractor = HapticFeatureExtractor.from_pretrained(".")
+```
+### Basic embedding inference
+```python
+import numpy as np
+from preprocessor.feature_extractor import HapticFeatureExtractor
+extractor = HapticFeatureExtractor.from_pretrained(".")
+sample = np.random.randn(1024, 12).astype("float32")
+features = extractor(sample)
+print(features["input_values"].shape)
+print(features["attention_mask"].shape)
+```
+See [`examples/inference.py`](./examples/inference.py) for a complete example.
+## Training
+Baseline training parameters are provided in [`configs/training_config.yaml`](./configs/training_config.yaml).
+These values are intended as a starting point for pretraining or continued domain adaptation, not as a claim of the exact recipe used for a released checkpoint.
+## Limitations
+- Performance depends heavily on sensor calibration and synchronization quality.
+- Out-of-distribution hardware setups may require updated preprocessing statistics.
+- Cross-modal alignment quality depends on the paired supervision used during training.
+- This repository scaffold does not include production weights.
+## Citation
+```bibtex
+@misc{motoko_embedding_1b,
+  title        = {Motoko Embedding 1B},
+  author       = {Motoko},
+  year         = {2026},
+  howpublished = {\url{https://huggingface.co/}}
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "architectures": [
+    "MotokoEmbeddingModel"
+  ],
+  "model_type": "motoko-haptic-encoder",
+  "hidden_size": 2048,
+  "intermediate_size": 8192,
+  "num_hidden_layers": 24,
+  "num_attention_heads": 16,
+  "num_key_value_heads": 16,
+  "max_position_embeddings": 4096,
+  "hidden_act": "silu",
+  "hidden_dropout_prob": 0.0,
+  "attention_probs_dropout_prob": 0.0,
+  "layer_norm_eps": 1e-05,
+  "initializer_range": 0.02,
+  "rope_theta": 10000.0,
+  "embedding_dim": 1024,
+  "projection_dim": 1024,
+  "pooling_type": "mean",
+  "patch_size": 16,
+  "num_input_channels": 12,
+  "torch_dtype": "float16",
+  "transformers_version": "4.46.0"
+}

configs/sensor_config.yaml ADDED Viewed

	@@ -0,0 +1,24 @@

+sensors:
+  force:
+    axes: [x, y, z]
+    units: newton
+  torque:
+    axes: [x, y, z]
+    units: newton_meter
+  pressure:
+    channels: 2
+    units: pascal
+  vibration:
+    channels: 2
+    units: arbitrary
+input:
+  total_channels: 12
+  sequence_length: 1024
+  sampling_rate_hz: 1000
+  patch_size: 16
+  padding_side: right
+alignment:
+  timestamp_sync: required
+  missing_value_policy: zero_pad

configs/training_config.yaml ADDED Viewed

	@@ -0,0 +1,44 @@

+model:
+  name: motoko-embedding-1-1b
+  embedding_dim: 1024
+  pooling_type: mean
+training:
+  seed: 42
+  epochs: 20
+  max_steps: 500000
+  per_device_train_batch_size: 16
+  per_device_eval_batch_size: 16
+  gradient_accumulation_steps: 8
+  learning_rate: 2.0e-4
+  min_learning_rate: 2.0e-5
+  weight_decay: 0.01
+  warmup_ratio: 0.03
+  max_grad_norm: 1.0
+  precision: bf16
+  gradient_checkpointing: true
+data:
+  train_manifest: data/train.jsonl
+  eval_manifest: data/eval.jsonl
+  sequence_length: 1024
+  num_channels: 12
+  sampling_rate_hz: 1000
+objectives:
+  contrastive_loss_weight: 1.0
+  reconstruction_loss_weight: 0.0
+  cross_modal_alignment_weight: 0.5
+optimizer:
+  type: adamw
+  betas: [0.9, 0.95]
+  eps: 1.0e-8
+logging:
+  report_to:
+    - tensorboard
+  logging_steps: 20
+  eval_steps: 1000
+  save_steps: 1000
+  save_total_limit: 5

examples/cross_modal.py ADDED Viewed

	@@ -0,0 +1,18 @@

+import numpy as np
+def l2_normalize(x):
+    return x / np.linalg.norm(x, axis=-1, keepdims=True)
+def main():
+    embedding_dim = 1024
+    haptic_embedding = l2_normalize(np.random.randn(1, embedding_dim).astype(np.float32))
+    vision_embedding = l2_normalize(np.random.randn(1, embedding_dim).astype(np.float32))
+    similarity = float((haptic_embedding * vision_embedding).sum())
+    print("Cross-modal similarity:", similarity)
+if __name__ == "__main__":
+    main()

examples/embedding_search.py ADDED Viewed

	@@ -0,0 +1,22 @@

+import numpy as np
+def cosine_similarity(query, matrix):
+    query_norm = query / np.linalg.norm(query)
+    matrix_norm = matrix / np.linalg.norm(matrix, axis=1, keepdims=True)
+    return matrix_norm @ query_norm
+def main():
+    embedding_dim = 1024
+    query = np.random.randn(embedding_dim).astype(np.float32)
+    index = np.random.randn(5, embedding_dim).astype(np.float32)
+    scores = cosine_similarity(query, index)
+    ranked = np.argsort(scores)[::-1]
+    print("Nearest neighbors:", ranked.tolist())
+    print("Scores:", scores[ranked].tolist())
+if __name__ == "__main__":
+    main()

examples/inference.py ADDED Viewed

	@@ -0,0 +1,17 @@

+import numpy as np
+from preprocessor.feature_extractor import HapticFeatureExtractor
+def main():
+    extractor = HapticFeatureExtractor.from_pretrained(".")
+    sample = np.random.randn(1024, 12).astype(np.float32)
+    features = extractor(sample)
+    print("input_values:", features["input_values"].shape)
+    print("attention_mask:", features["attention_mask"].shape)
+    print("This scaffold does not include a runnable model checkpoint.")
+if __name__ == "__main__":
+    main()

model/model.safetensors ADDED Viewed

File without changes

model/model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "metadata": {
+    "total_size": 0
+  },
+  "weight_map": {}
+}

preprocessor/feature_extractor.py ADDED Viewed

	@@ -0,0 +1,58 @@

+import json
+from pathlib import Path
+import numpy as np
+class HapticFeatureExtractor:
+    def __init__(self, config):
+        self.config = config
+        self.window_size = int(config["window_size"])
+        self.padding_value = float(config.get("padding_value", 0.0))
+        self.return_attention_mask = bool(config.get("return_attention_mask", True))
+        normalization = config.get("normalization", {})
+        self.mean = np.asarray(normalization.get("mean", []), dtype=np.float32)
+        self.std = np.asarray(normalization.get("std", []), dtype=np.float32)
+    @classmethod
+    def from_pretrained(cls, root):
+        root_path = Path(root)
+        config_path = root_path / "preprocessor" / "preprocessor_config.json"
+        with config_path.open("r", encoding="utf-8") as handle:
+            config = json.load(handle)
+        return cls(config)
+    def _normalize(self, values):
+        if not self.config.get("normalize", True):
+            return values
+        if self.mean.size == 0 or self.std.size == 0:
+            return values
+        denom = np.where(self.std == 0, 1.0, self.std)
+        return (values - self.mean) / denom
+    def __call__(self, values):
+        values = np.asarray(values, dtype=np.float32)
+        if values.ndim != 2:
+            raise ValueError("Expected input shape [sequence_length, num_channels].")
+        values = self._normalize(values)
+        length, channels = values.shape
+        if length >= self.window_size:
+            trimmed = values[: self.window_size]
+            attention_mask = np.ones(self.window_size, dtype=np.int64)
+        else:
+            pad_amount = self.window_size - length
+            padding = np.full((pad_amount, channels), self.padding_value, dtype=np.float32)
+            trimmed = np.concatenate([values, padding], axis=0)
+            attention_mask = np.concatenate(
+                [
+                    np.ones(length, dtype=np.int64),
+                    np.zeros(pad_amount, dtype=np.int64),
+                ]
+            )
+        result = {"input_values": trimmed}
+        if self.return_attention_mask:
+            result["attention_mask"] = attention_mask
+        return result

preprocessor/preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,40 @@

+{
+  "feature_extractor_type": "HapticFeatureExtractor",
+  "sampling_rate_hz": 1000,
+  "window_size": 1024,
+  "hop_length": 256,
+  "padding_value": 0.0,
+  "padding_side": "right",
+  "return_attention_mask": true,
+  "normalize": true,
+  "normalization": {
+    "mean": [
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0,
+      0.0
+    ],
+    "std": [
+      1.0,
+      1.0,
+      1.0,
+      1.0,
+      1.0,
+      1.0,
+      1.0,
+      1.0,
+      1.0,
+      1.0,
+      1.0,
+      1.0
+    ]
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "version": "1.0",
+  "truncation": null,
+  "padding": null,
+  "added_tokens": [
+    {
+      "id": 0,
+      "content": "<pad>",
+      "special": true
+    },
+    {
+      "id": 1,
+      "content": "<unk>",
+      "special": true
+    }
+  ],
+  "normalizer": {
+    "type": "Sequence"
+  },
+  "pre_tokenizer": {
+    "type": "WhitespaceSplit"
+  },
+  "model": {
+    "type": "WordLevel",
+    "vocab": {
+      "<pad>": 0,
+      "<unk>": 1,
+      "force_x": 2,
+      "force_y": 3,
+      "force_z": 4,
+      "torque_x": 5,
+      "torque_y": 6,
+      "torque_z": 7,
+      "pressure_0": 8,
+      "pressure_1": 9,
+      "vibration_0": 10,
+      "vibration_1": 11,
+      "contact_on": 12,
+      "contact_off": 13
+    },
+    "unk_token": "<unk>"
+  }
+}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "tokenizer_class": "SignalTokenizer",
+  "model_input_names": [
+    "input_values",
+    "attention_mask"
+  ],
+  "padding_side": "right",
+  "truncation_side": "right",
+  "pad_token": "<pad>",
+  "unk_token": "<unk>",
+  "max_length": 4096,
+  "do_normalize": true
+}