Types of Voice Deepfakes: Techniques, Tools, and Open-Source Methods

Community Article Published January 30, 2026

Voice deepfakes have advanced rapidly over the past few years, moving from research demos to widely accessible tools. Today, anyone with modest technical skills, or even just a browser, can generate or transform voices with surprising realism.

This article provides a high-level, practical overview of:

  • The main types of voice deepfake technologies
  • Common voice-cloning and conversion methods
  • A curated list of widely used tools and platforms (as of September 2025)

The goal is to inform, not promote, so readers can better understand both the capabilities and risks of modern voice synthesis.


1. Types of Voice Deepfakes

Voice Cloning

Voice cloning synthesizes new speech in a specific person’s voice.

Key characteristics:

  • Requires reference audio samples of the target speaker
  • Can generate words the person never actually spoke
  • Often used for narration, localization, or impersonation attacks

This is the most commonly discussed form of voice deepfake due to its potential for misuse.


Voice Conversion

Voice conversion transforms one person’s speech to sound like another.

Key characteristics:

  • Converts existing speech rather than generating new text
  • Can operate in near-real time
  • Commonly used for live voice changing, streaming, or singing voice conversion

Unlike voice cloning, the content already exists, the system changes who it sounds like, not what is being said.


Speech Synthesis (Text-to-Speech)

Speech synthesis creates entirely artificial voices.

Key characteristics:

  • May not resemble real people
  • Used in virtual assistants, audiobooks, accessibility tools, and games
  • Can still be combined with cloning techniques

While less controversial on its own, TTS becomes a concern when paired with identity mimicry.


2. Voice Cloning Resources & Methods

Below is a curated list of widely used voice-cloning and voice-conversion platforms as of September 5, 2025. This is not exhaustive, but reflects tools most commonly encountered in practice.


Hosted Platforms (Web Apps & APIs)

  • ElevenLabs , Instant and professional voice cloning; multilingual
  • Descript Overdub , Clone your own voice inside the Descript editor
  • Resemble AI , Custom AI voices with API access
  • Murf AI , Voiceovers with rapid and pro cloning options
  • LOVO (Genny) , TTS and voice cloning for creators
  • Speechify , Browser-based “My Voice” cloning
  • Typecast , Online TTS with voice cloning and avatars
  • Respeecher , Studio-grade cloning used in film and TV
  • ReadSpeaker , Enterprise custom TTS and voice cloning
  • iSpeech , TTS and voice cloning services
  • Uberduck , Web TTS and voice cloning (including music/rap)
  • Voice.ai , Real-time AI voice changer and voice agent platform
  • Hume AI (EVI 3) , Speech-to-speech model with rapid custom voice creation
  • VEED , Simple browser-based voice cloner
  • HeyGen , Video avatars with multilingual voice cloning
  • PlayHT , Studio-style TTS and voice cloning
  • Acapela , Enterprise-grade voice synthesis and cloning
  • Narakeet , TTS and narration tools
  • NaturalReader , TTS with custom voice options

Enterprise Cloud “Custom Voice” Programs

These typically require approval and contractual agreements:

  • Microsoft Azure Custom Neural Voice
  • Amazon Polly Brand Voice
  • Google Cloud Text-to-Speech – Instant Custom Voice

Real-Time Voice Changers (Desktop)

  • Voicemod , Real-time AI voices compatible with cloned inputs

These are often used in gaming, streaming, and live communication.


Open-Source & Self-Hosted Methods

Widely used in research and experimentation:

  • Coqui XTTS , Open-source multilingual voice cloning model (Hugging Face)
  • Bark , Open-source generative audio model
  • RVC (Retrieval-based Voice Conversion) , Real-time voice conversion
  • so-vits-svc , Singing voice conversion
  • Real-Time Voice Cloning (SV2TTS) , Early approach; lower quality by modern standards

Notes:

  • Coqui Studio (hosted app) shut down, but Coqui TTS models remain open-source
  • Replica Studios announced shutdown effective June 30, 2025

Final Thoughts

Voice deepfake technology spans a wide spectrum, from benign accessibility tools to high-risk impersonation systems. Understanding how these systems work and where they are deployed is essential for building detection, safeguards, and policy responses.

As these models continue to improve, transparency and informed use will matter as much as technical capability.


How are you currently handling voice authenticity, verification, or detection in your projects?

Community

Sign up or log in to comment