ARCHEX

The open-source intelligence layer for human augmentation.

Origin. Command. Evolution.

License: Apache 2.0 Model Demo Discord


What CUDA did for GPU computing β€” ARCHEX does for the human body.


What is ARCHEX?

ARCHEX is an open-source medical AI stack built for human augmentation.

We are not building another chatbot. We are building the intelligence layer that will power the next generation of prosthetic limbs, neural interfaces, cochlear implants, and human augmentation devices β€” the same way NVIDIA's CUDA became the invisible backbone of every AI system on earth.

Few people. One mission. The window is open right now.

OpenAI is building for chat. Google is building for search. Nobody is building the AI brain for the 500 million people who live with limb loss, paralysis, hearing loss, or vision impairment.

That is what ARCHEX is for.


The Stack

ARCHEX
β”œβ”€β”€ core/                  # Model training pipeline
β”‚   └── archex-medlm/      # Medical language model (open weights)
β”‚
β”œβ”€β”€ spectra/               # Synthetic biosignal environment
β”‚   β”œβ”€β”€ generators/        # EMG/EEG synthetic signal generation
β”‚   └── sim2real/          # Sim-to-real transfer utilities
β”‚
β”œβ”€β”€ meridian/              # Medical knowledge ingestion pipeline
β”‚   β”œβ”€β”€ pubmed/            # PubMed Open Access ingestion
β”‚   └── clinical/          # Clinical notes pipeline
β”‚
β”œβ”€β”€ signal-encoder/        # 1D-CNN biosignal β†’ LLM embedding bridge
β”‚   β”œβ”€β”€ models/            # Pretrained signal encoders
β”‚   └── tokenizer/         # BioSignal tokenization
β”‚
└── sdk/                   # ARCHEX SDK for bionic device integration
    β”œβ”€β”€ python/
    β”œβ”€β”€ cpp/               # For embedded/real-time systems
    └── examples/

Why now?

Signal What it means
Prosthetic limb market growing 6.8% YoY Demand is accelerating
EMG-controlled prosthetics still use 1990s pattern matching The AI gap is enormous
Open-source LLMs now match GPT-3.5 at 7B params We can afford to train this
Synthetic biosignal generation is finally viable Data is no longer the bottleneck
No foundation model exists for bionic control signals The category is unclaimed

The BioSignal Tokenization Problem

Every LLM processes text as tokens. EMG signals, EEG recordings, and neural spike trains are not text β€” they are continuous time-series with temporal dependencies, amplitude distributions, and frequency-domain features that existing tokenizers destroy.

ARCHEX solves this with a learnable 1D-CNN signal encoder that maps biosignal windows directly into the LLM's embedding space. A 250ms EMG window becomes a vector the transformer can reason about β€” alongside clinical notes, patient history, and movement intent labels.

This is the technical foundation for a bionic limb that doesn't just detect "open hand" vs "close hand" β€” but understands why the user wants to move, in the context of their clinical history and rehabilitation progress.


SPECTRA β€” Synthetic Biosignal Environment

SPECTRA generates physiologically realistic biosignal training data without hardware, participants, or cost. Built on published EMG spectral profiles, it produces unlimited synthetic training samples across 6 prosthetic hand gesture classes.

No hardware. No participants. No ethics approval required. Unlimited data at β‚Ή0.

This is the sim-to-real strategy β€” the same approach Boston Dynamics uses for robot locomotion. Train entirely in simulation, deploy to real hardware.


MERIDIAN β€” Medical Knowledge Pipeline

MERIDIAN ingests, cleans, and prepares open-license medical knowledge for training. Sources include PubMed Open Access (34M papers), PhysioNet biosignal collections, and clinical note datasets. All open-license, all verifiable, all reproducible.


Phases

Phase 0 β€” Foundation βœ“ Complete

  • SPECTRA synthetic EMG generator β€” 6 gesture classes, β‚Ή0 cost
  • ARCHEX MedLM v0 trained and live on HuggingFace
  • Public demo running at huggingface.co/spaces/archex-ai/archex-medlm-demo
  • Training pipeline validated on free hardware

Phase 1 β€” Medical Intelligence (Now β†’ Month 3)

  • MERIDIAN ingests 50,000 PubMed abstracts
  • Model becomes genuinely medically literate
  • REST API launch β€” medical consultation at scale
  • First paying customer

Phase 2 β€” Signal Intelligence (Month 4–6)

  • SPECTRA v2 β€” adds EEG, ECG synthetic generation
  • ARCHEX Signal Model β€” multimodal text + biosignals
  • Synthetic dataset sales to research labs
  • arXiv preprint: "BioSignal Tokenization for Multimodal Medical LLMs"

Phase 3 β€” Platform (Month 7–9)

  • ARCHEX SDK v1.0 β€” bionic device integration in 3 lines of code
  • Enterprise API for medical device companies
  • On-prem deployment for hospitals
  • First commercial bionic device integration

The CUDA Parallel

NVIDIA did not win because their GPU was the best. They won because they gave developers CUDA for free β€” and once developers wrote CUDA code, the hardware lock-in was automatic and invisible.

ARCHEX replicates this at the AI + bionics layer.

# This is what we are building toward.
# A prosthetic hand manufacturer writes this once.
# Then they cannot switch without rewriting everything.

from archex.sdk import BionicInterface, SignalEncoder

interface = BionicInterface(device="prosthetic_hand_v2")
encoder = SignalEncoder.from_pretrained("archex-ai/signal-encoder-emg-v1")

# Real-time EMG β†’ intent β†’ motor command
# 200ms end-to-end latency
# Running on a Raspberry Pi 4

for signal_window in interface.stream(sample_rate=250):
    intent = encoder.classify(signal_window)
    interface.actuate(intent)

Getting Started

git clone https://github.com/archex-ai/archex
cd archex
pip install -e ".[all]"

# Run the medical QA demo
python -m archex.demo.medqa

# Start SPECTRA β€” synthetic biosignal generation
python -m archex.spectra.generators.emg --gestures 6 --samples 500

Data Sources

All data used in ARCHEX training is open-license:

Dataset Domain Size License
PubMed Open Access Medical literature 34M papers NLM Open Access
MIMIC-IV Clinical notes 40K patients PhysioNet Credentialed
PhysioNet Collections Biosignals (ECG, EEG, EMG) 4TB Open Data Commons
BCI Competition IV Motor imagery EEG 9 subjects Free research use

Roadmap

  • SPECTRA synthetic biosignal generator
  • ARCHEX MedLM v0 β€” open weights on HuggingFace
  • Live demo β€” huggingface.co/spaces/archex-ai/archex-medlm-demo
  • Training pipeline on free hardware
  • MERIDIAN β€” PubMed ingestion pipeline
  • ARCHEX MedLM v1 β€” medically literate
  • Signal encoder pretrained weights
  • ARCHEX Signal Model β€” multimodal
  • SDK v1.0
  • arXiv paper
  • First bionic device integration

Contributing

ARCHEX is built for the long term. If you are a developer, ML researcher, biomedical engineer, or someone who uses or builds bionic devices β€” you belong here.

Good first issues:  [good first issue]
Research bounties:  [research]
Medical domain:     [biomedical]

Join the Discord β€” introduce yourself.


License

Apache 2.0 β€” use it, build on it, sell with it. The only thing we ask: if you extend the BioSignal Tokenization spec, contribute it back. We are building a standard, not a walled garden.


Built by humans, for humans who need more than what they were given.

ARCHEX β€” archex.ai

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for archex-ai/archex-medlm-v0

Finetuned
(535)
this model

Dataset used to train archex-ai/archex-medlm-v0