May 1^st 2025 · 1 min read · #ai #cloning #speech #stt #synthesis #tts #voice #whisper

AI Speech Technologies

This page is a collection of notes and links related to AI speech technologies, including Text-to-Speech (TTS), Speech-to-Text (STT), voice synthesis, voice cloning, and other related frippery in the modern AI space.

Resources

Field	Category	Date	Link	Notes
Generative Audio	models	2023	bark	a text-prompted genereative audio model
Speech Recognition	Libraries	2025	WhisperKit	a Swift package that integrates Whisper with Apple’s CoreML
	Models	2024	WhisperLive	a real-time text-to-speech system based on Whisper
		2024	moonshine	a family of models optimized for fast and accurate automatic speech recognition on resource-constrained devices.
		2023	distil-whisper	a distilled version of whisper that is 6 times faster
		2022	whisper.cpp	a C++ implementation of whisper that can run in consumer hardware
		2022	whisper	a general purpose speech recognition model
	Tools	2024	audapolis	an editor for spoken-word audio with automatic transcription
	Tools	2023	insanely-fast-whisper	An opinionated CLI for audio transcription
Speech Synthesis	Models	2025	csm	a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs.
			Orpheus-TTS	an open-source text-to-speech system built on Llama-3b
			chatterbox	a text-to-speech model that can generate expressive speech with a variety of styles and emotions.
		2024	ChatTTS	a text-to-speech model designed specifically for dialogue scenarios, with decent prosody
			Real-Time-Voice-Cloning	a PyTorch implementation of a voice cloning model
			WhisperSpeech	a text-to-speech system built by inverting Whisper
		2023	StyleTTS2	A text to speech model that supports style diffusion
	Resources		Training a voice for piper TTS	a detailed walkthrough of how to customize a voice model
	Tools	2025	voice-pro	a tool for doing speech processing and voice cloning
			edge-tts	a text-to-speech module that leverages the Microsoft Edge TTS API
			podcastfy	a tool for generating podcasts from text
		2024	OpenVoice	a tool that enables accurate voice cloning with multi-lingual support and flexible style control.

← The Kingroon KP3S Pro (V1), Two Years Later On The Apple U.S. App Guidelines Update →

This page is referenced in:

The Great AI Breakdown • May 1^st 2025
Artificial Intelligence • Jan 22^nd 2004