| --- |
| license: apache-2.0 |
| library_name: transformers |
| tags: |
| - robotics |
| - haptics |
| - embeddings |
| - multimodal |
| - encoder |
| pipeline_tag: feature-extraction |
| --- |
| |
| # Motoko Embedding 1B |
|
|
| Motoko Embedding 1B is a foundation embedding model for haptic signal representation in robotics. |
| It encodes raw force, torque, pressure, and vibration signals into rich fixed-dimension vector embeddings for retrieval, search, and cross-modal fusion. |
|
|
| ## Model Summary |
|
|
| - Model type: Encoder-only Transformer |
| - Parameters: 1B |
| - Input: Force, torque, pressure, vibration sequences |
| - Output: Fixed-dimension embedding vectors |
| - License: Apache 2.0 |
|
|
| ## Intended Uses |
|
|
| - Semantic search over haptic datasets |
| - Cross-modal alignment with vision and language |
| - Haptic RAG pipelines for robotic agents |
| - Dataset indexing and similarity clustering |
| - Downstream fine-tuning with LoRA adapters |
|
|
| ## Architecture |
|
|
| Motoko Embedding 1B uses a signal-aware preprocessing stack followed by an encoder-only Transformer. |
| Multichannel sensor streams are windowed, normalized, projected into token embeddings, and aggregated into a single fixed-size embedding representation. |
|
|
| Key design points: |
|
|
| - Temporal patching over multiaxis haptic sequences |
| - Rotary position embeddings for long-context signal modeling |
| - Mean pooling over the final hidden states for embedding extraction |
| - Optional projection head for cross-modal alignment |
|
|
| ## Input Format |
|
|
| The model expects synchronized haptic sequences containing one or more of the following modalities: |
|
|
| - Force |
| - Torque |
| - Pressure |
| - Vibration |
|
|
| Default sensor assumptions are defined in [`configs/sensor_config.yaml`](./configs/sensor_config.yaml). |
| Signal normalization and windowing parameters are defined in [`preprocessor/preprocessor_config.json`](./preprocessor/preprocessor_config.json). |
|
|
| ## Repository Layout |
|
|
| ```text |
| . |
| βββ README.md |
| βββ config.json |
| βββ tokenizer_config.json |
| βββ tokenizer.json |
| βββ model/ |
| β βββ model.safetensors |
| β βββ model.safetensors.index.json |
| βββ preprocessor/ |
| β βββ preprocessor_config.json |
| β βββ feature_extractor.py |
| βββ configs/ |
| β βββ training_config.yaml |
| β βββ sensor_config.yaml |
| βββ examples/ |
| β βββ inference.py |
| β βββ embedding_search.py |
| β βββ cross_modal.py |
| βββ .gitattributes |
| ``` |
|
|
| ## Key Files |
|
|
| | File | Purpose | |
| | --- | --- | |
| | `config.json` | Encoder architecture: layers, heads, hidden size, projection dimensions | |
| | `configs/sensor_config.yaml` | Sensor input specs: axes, sequence length, sampling rate | |
| | `preprocessor/preprocessor_config.json` | Signal normalization, windowing, padding behavior | |
| | `preprocessor/feature_extractor.py` | Converts raw haptic arrays into encoder-ready tensors | |
| | `examples/embedding_search.py` | Vector similarity search over haptic embeddings | |
| | `examples/cross_modal.py` | Aligns haptic embeddings with vision or language vectors | |
|
|
| ## Usage |
|
|
| ### Load the processor |
|
|
| ```python |
| from preprocessor.feature_extractor import HapticFeatureExtractor |
| |
| extractor = HapticFeatureExtractor.from_pretrained(".") |
| ``` |
|
|
| ### Basic embedding inference |
|
|
| ```python |
| import numpy as np |
| |
| from preprocessor.feature_extractor import HapticFeatureExtractor |
| |
| extractor = HapticFeatureExtractor.from_pretrained(".") |
| sample = np.random.randn(1024, 12).astype("float32") |
| features = extractor(sample) |
| |
| print(features["input_values"].shape) |
| print(features["attention_mask"].shape) |
| ``` |
|
|
| See [`examples/inference.py`](./examples/inference.py) for a complete example. |
|
|
| ## Training |
|
|
| Baseline training parameters are provided in [`configs/training_config.yaml`](./configs/training_config.yaml). |
| These values are intended as a starting point for pretraining or continued domain adaptation, not as a claim of the exact recipe used for a released checkpoint. |
|
|
| ## Limitations |
|
|
| - Performance depends heavily on sensor calibration and synchronization quality. |
| - Out-of-distribution hardware setups may require updated preprocessing statistics. |
| - Cross-modal alignment quality depends on the paired supervision used during training. |
| - This repository scaffold does not include production weights. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{motoko_embedding_1b, |
| title = {Motoko Embedding 1B}, |
| author = {Motoko}, |
| year = {2026}, |
| howpublished = {\url{https://huggingface.co/}} |
| } |
| ``` |
|
|