OpenVoice: MIT and MyShell Introduce Instantaneous Open-Source Voice Cloning

OpenVoice: MIT and MyShell Introduce Instantaneous Open-Source Voice Cloning

Startups, such as the increasingly recognized ElevenLabs, have garnered substantial funding to develop proprietary algorithms for voice cloning, creating audio programs that replicate users' voices. However, a new solution, OpenVoice, has emerged, developed collaboratively by researchers at the Massachusetts Institute of Technology (MIT), Tsinghua University in Beijing, China, and the Canadian AI startup MyShell. OpenVoice distinguishes itself by offering nearly instantaneous open-source voice cloning with unprecedented granular controls.

MyShell, promoting OpenVoice, claims to enable users to "clone voices with unparalleled precision, with granular control of tone, from emotion to accent, rhythm, pauses, and intonation, using just a small audio clip." This development challenges existing voice cloning platforms by providing a level of customization not seen before.

According to Zengyi Qin, one of the lead researchers from MIT and MyShell, the core ethos of MyShell is 'AI for All.' He stated, "MyShell wants to benefit the whole research community. OpenVoice is just a start. In the future, we will even provide grants & dataset & computing power to support the open-source research community."

Using OpenVoice: A User-Friendly Experience

Unlike other voice cloning apps, OpenVoice offers a user-friendly experience. In unscientific tests, users were able to generate convincing voice clones rapidly, within seconds, without being constrained to specific texts. MyShell's web app interface and HuggingFace allow users to access and try OpenVoice, emphasizing the platform's commitment to inclusivity.

Users can adjust the cloned voice's style, choosing from defaults like cheerful, sad, friendly, or angry, using a dropdown menu. This functionality provides a diverse range of applications for users seeking customized voice content.

How OpenVoice Works: A Simple Yet Effective Approach

OpenVoice comprises two distinct AI models: a text-to-speech (TTS) model and a tone converter. The TTS model, trained on audio samples from various speakers, controls style parameters, languages, intonation, rhythm, and pauses. The tone converter model, trained on a vast dataset, enables users to modify the emotional expression of the spoken text.

MyShell's approach, though conceptually simple, demonstrates effectiveness in voice cloning with significantly fewer compute resources compared to other methods. This simplicity, coupled with flexibility in controlling styles, emotions, and accents, sets OpenVoice apart.

Behind OpenVoice: MyShell's Vision

Founded in 2023, MyShell positions itself as a decentralized platform for discovering, creating, and staking AI-native apps. With over 400,000 users, MyShell's web app offers more than just voice cloning. It includes various text-based AI characters, bots with different personalities, an animated GIF maker, and user-generated text-based RPGs.

While OpenVoice is open source, MyShell monetizes its platform through monthly subscriptions for users and third-party bot creators, along with charges for AI training data. The company's multifaceted approach aims to make AI accessible and customizable for a broad user base.

OpenVoice's introduction challenges the status quo in voice cloning technology, providing a user-friendly and flexible solution through open-source collaboration. MyShell's commitment to supporting the research community aligns with its broader vision of making AI accessible to everyone.