PersonaPlex 7B Hybrid β€” Distilled + LLM Reasoning

Distilled NF4 weights with a hybrid architecture: PersonaPlex handles voice I/O, Qwen/Ollama handles reasoning.

Architecture

User Voice β†’ PersonaPlex (ASR+TTS) β†’ Text β†’ Qwen 27B/122B (reasoning) β†’ Text β†’ User

PersonaPlex processes audio in real-time (full-duplex). When it generates a complete sentence, the hybrid agent intercepts it and routes through a local LLM for intelligent response.

Distillation Results

Trained for 5 epochs on 3,000 samples from the bf16 teacher (73 min on A100).

Model Token Match vs bf16 Output Quality
bf16 (teacher) 100% Reference
NF4 raw (before) 75% Coherent but divergent
NF4 distilled 90% Close match to teacher

Training loss: 0.5823 β†’ 0.0697 (88% reduction over 5 epochs).

Quick Start

# Clone the repo
git clone https://github.com/robit-man/personaplex.git
cd personaplex

# Start with hybrid mode (PersonaPlex voice + Qwen reasoning)
source personaplex-setup/venv/bin/activate
export PYTHONPATH="personaplex-setup/moshi:"
export HYBRID_LLM_MODEL="open-agents-qwen35:27b"
python -m moshi.server   --moshi-weight student_best.pt \  # or download from this repo
  --device cuda --hybrid --host 0.0.0.0

# For Qwen 122B (deeper reasoning, higher latency):
export HYBRID_LLM_MODEL="open-agents-qwen35:122b"

LLM Model Selection

Model Latency Best For
Qwen 3.5:9B ~1s Quick exchanges
Qwen 3.5:27B ~2s General conversation (recommended)
Qwen 3.5:122B ~5-10s Complex analysis
Nemotron 3 Super 120B ~5-10s Tool calling, codebase analysis

Files

File Description
student_best.pt Distilled bf16 weights (15.6 GB)
training_log.json Training metrics
distill_v2.py Distillation training script

Anti-Call-Center Training

The prompts used for distillation explicitly enforce:

  • No self-naming (model never introduces itself by name)
  • No "how can I help" patterns
  • Direct, natural responses instead of customer service scripts

Note: The base PersonaPlex model was trained on call center data, so these tendencies are baked into the architecture. The hybrid approach solves this by routing through an LLM that follows the prompt correctly.

Training Config

{
  "epochs": 5,
  "lr": 5e-6,
  "temperature": 2.0,
  "alpha_kl": 0.7,
  "alpha_hard": 0.3,
  "total_samples": 3000,
  "optimizer": "AdamW",
  "scheduler": "CosineAnnealingLR"
}

License

Same as base: NVIDIA Open Model License.

Built by open-agents-ai.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cudabenchmarktest/personaplex-7b-hybrid

Finetuned
(36)
this model