Add Hugging Face model scaffold

14e9a9f 13 days ago

4.39 kB

	---
	license: apache-2.0
	library_name: transformers
	tags:
	- robotics
	- haptics
	- embeddings
	- multimodal
	- encoder
	pipeline_tag: feature-extraction
	---

	# Motoko Embedding 1B

	Motoko Embedding 1B is a foundation embedding model for haptic signal representation in robotics.
	It encodes raw force, torque, pressure, and vibration signals into rich fixed-dimension vector embeddings for retrieval, search, and cross-modal fusion.

	## Model Summary

	- Model type: Encoder-only Transformer
	- Parameters: 1B
	- Input: Force, torque, pressure, vibration sequences
	- Output: Fixed-dimension embedding vectors
	- License: Apache 2.0

	## Intended Uses

	- Semantic search over haptic datasets
	- Cross-modal alignment with vision and language
	- Haptic RAG pipelines for robotic agents
	- Dataset indexing and similarity clustering
	- Downstream fine-tuning with LoRA adapters

	## Architecture

	Motoko Embedding 1B uses a signal-aware preprocessing stack followed by an encoder-only Transformer.
	Multichannel sensor streams are windowed, normalized, projected into token embeddings, and aggregated into a single fixed-size embedding representation.

	Key design points:

	- Temporal patching over multiaxis haptic sequences
	- Rotary position embeddings for long-context signal modeling
	- Mean pooling over the final hidden states for embedding extraction
	- Optional projection head for cross-modal alignment

	## Input Format

	The model expects synchronized haptic sequences containing one or more of the following modalities:

	- Force
	- Torque
	- Pressure
	- Vibration

	Default sensor assumptions are defined in [`configs/sensor_config.yaml`](./configs/sensor_config.yaml).
	Signal normalization and windowing parameters are defined in [`preprocessor/preprocessor_config.json`](./preprocessor/preprocessor_config.json).

	## Repository Layout

	```text
	.
	├── README.md
	├── config.json
	├── tokenizer_config.json
	├── tokenizer.json
	├── model/
	│ ├── model.safetensors
	│ └── model.safetensors.index.json
	├── preprocessor/
	│ ├── preprocessor_config.json
	│ └── feature_extractor.py
	├── configs/
	│ ├── training_config.yaml
	│ └── sensor_config.yaml
	├── examples/
	│ ├── inference.py
	│ ├── embedding_search.py
	│ └── cross_modal.py
	└── .gitattributes
	```

	## Key Files

	\| File \| Purpose \|
	\| --- \| --- \|
	\| `config.json` \| Encoder architecture: layers, heads, hidden size, projection dimensions \|
	\| `configs/sensor_config.yaml` \| Sensor input specs: axes, sequence length, sampling rate \|
	\| `preprocessor/preprocessor_config.json` \| Signal normalization, windowing, padding behavior \|
	\| `preprocessor/feature_extractor.py` \| Converts raw haptic arrays into encoder-ready tensors \|
	\| `examples/embedding_search.py` \| Vector similarity search over haptic embeddings \|
	\| `examples/cross_modal.py` \| Aligns haptic embeddings with vision or language vectors \|

	## Usage

	### Load the processor

	```python
	from preprocessor.feature_extractor import HapticFeatureExtractor

	extractor = HapticFeatureExtractor.from_pretrained(".")
	```

	### Basic embedding inference

	```python
	import numpy as np

	from preprocessor.feature_extractor import HapticFeatureExtractor

	extractor = HapticFeatureExtractor.from_pretrained(".")
	sample = np.random.randn(1024, 12).astype("float32")
	features = extractor(sample)

	print(features["input_values"].shape)
	print(features["attention_mask"].shape)
	```

	See [`examples/inference.py`](./examples/inference.py) for a complete example.

	## Training

	Baseline training parameters are provided in [`configs/training_config.yaml`](./configs/training_config.yaml).
	These values are intended as a starting point for pretraining or continued domain adaptation, not as a claim of the exact recipe used for a released checkpoint.

	## Limitations

	- Performance depends heavily on sensor calibration and synchronization quality.
	- Out-of-distribution hardware setups may require updated preprocessing statistics.
	- Cross-modal alignment quality depends on the paired supervision used during training.
	- This repository scaffold does not include production weights.

	## Citation

	```bibtex
	@misc{motoko_embedding_1b,
	title = {Motoko Embedding 1B},
	author = {Motoko},
	year = {2026},
	howpublished = {\url{https://huggingface.co/}}
	}
	```