OpenAI has been rapidly advancing its ChatGPT AI chatbot and Sora AI video creator over the past year. Now, it’s unveiling a new tool: Voice Generation. This tool can craft synthetic voices from just 15 seconds of audio.
According to a blog post on The Verge, OpenAI has been conducting a “small-scale preview” of Voice Engine, a project in development since late 2022. Voice Engine is already integrated into the Read Aloud feature of the ChatGPT app, which reads responses aloud.
Once a voice is trained using a brief 15-second sample, it can articulate any desired text in an “emotive and realistic” manner. OpenAI suggests potential applications include education, podcast translation, outreach to remote communities, and aiding non-verbal individuals.
Although not universally available yet, samples created by Voice Engine are accessible for listening. The released clips from OpenAI exhibit impressive quality, though some detect a faint robotic undertone.
Safety is a priority.
Concerns about misuse are the main reason Voice Engine is only available in a limited preview. OpenAI wants to research more on protecting tools like this from spreading misinformation and copying voices without consent.
“We aim to initiate discussions on responsibly using synthetic voices and how society can adapt to these new capabilities,” states OpenAI. “Based on these discussions and test results, we’ll decide whether and how to deploy this technology widely.”
With significant elections looming in the US and UK, and generative AI tools advancing, trust in AI content—audio, text, and video—is a growing concern.
OpenAI acknowledges the potential issues with voice authentication and scams. These challenges require solutions to ensure secure communication over the phone and distinguish genuine callers from impostors. It’s imperative to address these issues effectively.